Forum: d0p3 BBS

Re: else ladders practice

From David Brown@3:633/280.2 to All on Mon Nov 4 18:18:34 2024

On 04/11/2024 05:00, Tim Rentsch wrote:

fir <fir@grunge.pl> writes:

Tim Rentsch wrote:

fir <fir@grunge.pl> writes:

Bart wrote:

ral clear patterns here: you're testing the same variable 'n'
against several mutually exclusive alternatives, which also happen
to be consecutive values.

C is short of ways to express this, if you want to keep those
'somethings' as inline code (otherwise arrays of function pointers
or even label pointers could be use

so in short this groupo seem to have no conclusion but is tolerant
foir various approaches as it seems

imo the else latder is like most proper but i dont lkie it
optically, swich case i also dont like (use as far i i remember
never in my code, for years dont use even one)

so i persnally would use bare ifs and maybe elses ocasionally
(and switch should be mended but its fully not clear how,

I think you should have confidence in your own opinion. All
you're getting from other people is their opinion about what is
easier to understand, or "clear", or "readable", etc. As long as
the code is logically correct you are free to choose either
style, and it's perfectly okay to choose the one that you find
more appealing.

There is a case where using 'else' is necessary, when there is a
catchall action for circumstances matching "none of the above".
Alternatively a 'break' or 'continue' or 'goto' or 'return' may
be used to bypass subsequent cases, but you get the idea.

With the understanding that I am offering more than my own opinion,
I can say that I might use any of the patterns mentioned, depending
on circumstances. I don't think any one approach is either always
right or always wrong.

maybe, but some may heve some strong arguments (for use this and not
that) i may overlook

I acknowledge the point, but you haven't gotten any arguments,
only opinions.

There have been /some/ justifications for some of the opinions - but
much of it has been merely opinions. And other people's opinions and
thoughts can be inspirational in forming your own opinions.

Once the OP (or anyone else) has looked at these, and garnered the ideas floated around, he might then modify his own opinions and preferences as
a result. In the end, however, you are right that it is the OP's own
opinions and preferences that should guide the style of the code - only
he knows what the real code is, and what might suit best for the task in
hand.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Mon Nov 4 22:56:03 2024

On 04/11/2024 04:00, Tim Rentsch wrote:

fir <fir@grunge.pl> writes:

Tim Rentsch wrote:

With the understanding that I am offering more than my own opinion,
I can say that I might use any of the patterns mentioned, depending
on circumstances. I don't think any one approach is either always
right or always wrong.

maybe, but some may heve some strong arguments (for use this and not
that) i may overlook

I acknowledge the point, but you haven't gotten any arguments,
only opinions.

Pretty much everything about PL design is somebody's opinion.

Somebody may try to argue about a particular approach or feature being
more readable, easier to understand, to implement, more ergonomic, more intuitive, more efficient, more maintainable etc, but they are never
going to convince anyone who has a different view or who is too used to another approach.

In this case, it was about how to express a coding pattern in a
particular language, as apparently the OP didn't like writing the 'else'
in 'else if', and they didn't like using 'switch'.

You are trying to argue against somebody's personal preference; that's
never going to go well. Even when you use actual facts, such as having
the wrong behaviour when those 'somethings' do certain things.

Here, the question was, can:

if (c1) s1;
else if (c2) s2;

always be rewritten as:

if (c1) s1;
if (c2) s2;

In general, the answer has to be No. But when the OP doens't like that
answer, what can you do?

Even when the behaviour is the same for a particular set of c1/c2/s1/s2,
the question then was: is it always going to be as efficient (since c2
may be sometimes be evaluated unnessarily). Then it depends on quality
of implementation, another ill-defined factor.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Mon Nov 4 23:29:09 2024

On 04.11.2024 12:56, Bart wrote:

[...]

Here, the question was, can:

if (c1) s1;
else if (c2) s2;

always be rewritten as:

if (c1) s1;
if (c2) s2;

Erm, no. The question was even more specific. It had (per example)
not only all ci disjunct but also defined as a linear sequence of
natural numbers! - In other languages [than "C"] this may be more
important since [historically] there were specific constructs for
that case; see e.g. 'switch' definitions in Simula, or the 'case'
statement of Algol 68, both mapping elements onto an array[1..N];
labels in the first case, and expressions in the latter case. So
in "C" we could at least consider using something similar, like,
say, arrays of function pointers indexed by those 'n'. (Not that
I'd suggest that by just pointing it out.)

I'm a bit astonished, BTW, about this huge emphasis on the topic
"opinions" in later posts of this thread. The OP asked (even in
the subject) about "practice" which actually invites if not asks
for providing opinions (besides practical experiences).

(He also asked about two specific aspects; performance and terse
code. Answers to that can already be derived from various posts'
answers.)

Janis

[...]

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Mon Nov 4 23:38:06 2024

On 04/11/2024 12:29, Janis Papanagnou wrote:

On 04.11.2024 12:56, Bart wrote:

[...]

Here, the question was, can:

if (c1) s1;
else if (c2) s2;

always be rewritten as:

if (c1) s1;
if (c2) s2;

Erm, no. The question was even more specific.

I mean that the question came down to this. After all he had already
decided on that second form rather than the first, and had acknowledged
that the 'else's were missing.

That the OP's example contained some clear patterns has already been
covered (I did so anyway).

It had (per example)
not only all ci disjunct but also defined as a linear sequence of
natural numbers! - In other languages [than "C"] this may be more
important since [historically] there were specific constructs for
that case; see e.g. 'switch' definitions in Simula, or the 'case'
statement of Algol 68, both mapping elements onto an array[1..N];
labels in the first case, and expressions in the latter case. So
in "C" we could at least consider using something similar, like,
say, arrays of function pointers indexed by those 'n'.

That too!

! (Not that

I'd suggest that by just pointing it out.)

I'm a bit astonished, BTW, about this huge emphasis on the topic
"opinions" in later posts of this thread. The OP asked (even in
the subject) about "practice" which actually invites if not asks
for providing opinions (besides practical experiences).

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Mon Nov 4 23:40:48 2024

On 02.11.2024 19:09, Tim Rentsch wrote:

[...] As long as
the code is logically correct you are free to choose either
style, and it's perfectly okay to choose the one that you find
more appealing.

This is certainly true for one-man-shows. Hardly suited for most
professional contexts I worked in. (Just my experience, of course.
And people are free to learn things the Hard Way, if they prefer.)

Janis

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Mon Nov 4 23:46:34 2024

On 04.11.2024 13:38, Bart wrote:

That the OP's example contained some clear patterns has already been
covered (I did so anyway).

I haven't read every post, even if I occasionally take some time
to catch up.[*]

Janis

[*] Threads in this group, even for trivial things, tend to get
band-worms and individual posts often very longish.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From fir@3:633/280.2 to Bart on Tue Nov 5 02:02:16 2024

To: Bart <bc@freeuk.com>

Bart wrote:

On 04/11/2024 04:00, Tim Rentsch wrote:

fir <fir@grunge.pl> writes:

Tim Rentsch wrote:

With the understanding that I am offering more than my own opinion,
I can say that I might use any of the patterns mentioned, depending
on circumstances. I don't think any one approach is either always
right or always wrong.

maybe, but some may heve some strong arguments (for use this and not
that) i may overlook

I acknowledge the point, but you haven't gotten any arguments,
only opinions.

Pretty much everything about PL design is somebody's opinion.

overally when you think and discuss such thing some conclusions may do
appear - and often soem do for me, though they are not always very clear
or 'hard'

overally from this thread i noted that switch (which i already dont
liked) is bad.. note those two elements of switch it is "switch"
and "Case" are in weird not obvious relation in c (and what will it
work when you mix it etc)

what i concluded was than if you do thing such way

a { } //this is analogon to case - named block
b { } //this is analogon to case - named block
n() // here by "()" i noted call of some wariable that mey yeild
'call' to a ,b, c, d, e, f //(in that case na would be soem enum or
pointer)
c( ) //this is analogon to case - named block
d( ) //this is analogon to case - named block

then everything is clear - this call just selects and calls block , and
block itself are just definitions and are skipped in execution until
"called"

this is example of some conclusion for me from thsi thread - and i think
such codes as this my own initial example should be probably done such
way (though it is not c, i know

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: i2pn2 (i2pn.org) (3:633/280.2@fidonet)

From fir@3:633/280.2 to Bart on Tue Nov 5 02:06:56 2024

To: Bart <bc@freeuk.com>

fir wrote:

Bart wrote:

On 04/11/2024 04:00, Tim Rentsch wrote:

fir <fir@grunge.pl> writes:

Tim Rentsch wrote:

With the understanding that I am offering more than my own opinion,
I can say that I might use any of the patterns mentioned, depending
on circumstances. I don't think any one approach is either always
right or always wrong.

maybe, but some may heve some strong arguments (for use this and not
that) i may overlook

I acknowledge the point, but you haven't gotten any arguments,
only opinions.

Pretty much everything about PL design is somebody's opinion.

overally when you think and discuss such thing some conclusions may do
appear - and often soem do for me, though they are not always very clear
or 'hard'

overally from this thread i noted that switch (which i already dont
liked) is bad.. note those two elements of switch it is "switch"
and "Case" are in weird not obvious relation in c (and what will it
work when you mix it etc)

what i concluded was than if you do thing such way

a { } //this is analogon to case - named block
b { } //this is analogon to case - named block
n() // here by "()" i noted call of some wariable that mey yeild
'call' to a ,b, c, d, e, f //(in that case na would be soem enum or
pointer)
c( ) //this is analogon to case - named block
d( ) //this is analogon to case - named block

then everything is clear - this call just selects and calls block , and
block itself are just definitions and are skipped in execution until
"called"

this is example of some conclusion for me from thsi thread - and i think
such codes as this my own initial example should be probably done such
way (though it is not c, i know

note in fact both array usage like tab[5] and fuunction call like foo()
are analogues to swich case - as when you call fuctions the call is like switch and function definition sets are 'cases'

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: i2pn2 (i2pn.org) (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Tue Nov 5 02:21:37 2024

On 04/11/2024 15:06, fir wrote:

fir wrote:

Bart wrote:

On 04/11/2024 04:00, Tim Rentsch wrote:

fir <fir@grunge.pl> writes:

Tim Rentsch wrote:

With the understanding that I am offering more than my own opinion, >>>>>> I can say that I might use any of the patterns mentioned, depending >>>>>> on circumstances.� I don't think any one approach is either always >>>>>> right or always wrong.

maybe, but some may heve some strong arguments (for use this and not >>>>> that) i may overlook

I acknowledge the point, but you haven't gotten any arguments,
only opinions.

Pretty much everything about PL design is somebody's opinion.

overally when you think and discuss such thing some conclusions may do
appear - and often soem do for me, though they are not always very clear
or 'hard'

overally from this thread i noted that switch (which i already dont
liked) is bad.. note those two elements of switch it is "switch"
and "Case" are in weird not obvious relation in c (and what will it
work when you mix it etc)

what i concluded was than if you do thing such way

a {� }� //this is analogon to case - named block
b {� }� //this is analogon to case - named block
n()�� // here by "()" i noted call of some wariable that mey yeild
'call' to a ,b, c, d, e, f� //(in that case na would be soem enum or
pointer)
c(� ) //this is analogon to case - named block
d(� ) //this is analogon to case - named block

then everything is clear - this call just selects and calls block , and
block itself are just definitions and are skipped in execution until
"called"

this is example of some conclusion for me from thsi thread - and i think
such codes as this my own initial example should be probably done such
way (though it is not c, i know

note in fact both array usage like tab[5] and fuunction call like foo()
are analogues to swich case - as when you call fuctions the call is like switch and function definition sets are 'cases'

Yes, switch could be implemented via a table of label pointers, but it
needs a GNU extension.

For example this switch:

#include <stdio.h>

int main(void) {
for (int i=0; i<10; ++i) {
switch(i) {
case 7: case 2: puts("two or seven"); break;
case 5: puts("five"); break;
default: puts("other");
}
}
}

Could also be written like this:

#include <stdio.h>

int main(void) {
void* table[] = {
&&Lother, &&Lother, &&L27, &&Lother, &&Lother, &&L5,
&&Lother, &&L27, &&Lother, &&Lother};

for (int i=0; i<10; ++i) {
goto *table[i];

L27: puts("two or seven"); goto Lend;
L5: puts("five"); goto Lend;
Lother: puts("other");
Lend:;
}
}

(A compiler may generate something like this, although it will be range-checked if need. In practice, small numbers of cases, or where the
case values are too spread out, might be implemented as if-else chains.)

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From fir@3:633/280.2 to Bart on Tue Nov 5 02:34:46 2024

To: Bart <bc@freeuk.com>

fir wrote:

Bart wrote:

On 04/11/2024 04:00, Tim Rentsch wrote:

fir <fir@grunge.pl> writes:

Tim Rentsch wrote:

With the understanding that I am offering more than my own opinion,
I can say that I might use any of the patterns mentioned, depending
on circumstances. I don't think any one approach is either always
right or always wrong.

maybe, but some may heve some strong arguments (for use this and not
that) i may overlook

I acknowledge the point, but you haven't gotten any arguments,
only opinions.

Pretty much everything about PL design is somebody's opinion.

overally when you think and discuss such thing some conclusions may do
appear - and often soem do for me, though they are not always very clear
or 'hard'

overally from this thread i noted that switch (which i already dont
liked) is bad.. note those two elements of switch it is "switch"
and "Case" are in weird not obvious relation in c (and what will it
work when you mix it etc)

what i concluded was than if you do thing such way

a { } //this is analogon to case - named block
b { } //this is analogon to case - named block
n() // here by "()" i noted call of some wariable that mey yeild
'call' to a ,b, c, d, e, f //(in that case na would be soem enum or
pointer)
c( ) //this is analogon to case - named block
d( ) //this is analogon to case - named block

second wersion would be the one based on labels and goto

a:
b:
n!
c:
d:

gere n! wuld symbolize goto n and the different operator means dfference
among "call" ang "jmp" on assembly level and lack of block
would denote lack on ret on assembly level

im not sure byut maybe that those two versions span all needed
(not sure as to this, but as said one sxpresses jumps and one calls
on assembly level

then everything is clear - this call just selects and calls block , and
block itself are just definitions and are skipped in execution until
"called"

this is example of some conclusion for me from thsi thread - and i think
such codes as this my own initial example should be probably done such
way (though it is not c, i know

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: i2pn2 (i2pn.org) (3:633/280.2@fidonet)

From fir@3:633/280.2 to Bart on Tue Nov 5 02:52:17 2024

To: Bart <bc@freeuk.com>

Bart wrote:

On 04/11/2024 15:06, fir wrote:

fir wrote:

Bart wrote:

On 04/11/2024 04:00, Tim Rentsch wrote:

fir <fir@grunge.pl> writes:

Tim Rentsch wrote:

With the understanding that I am offering more than my own opinion, >>>>>>> I can say that I might use any of the patterns mentioned, depending >>>>>>> on circumstances. I don't think any one approach is either always >>>>>>> right or always wrong.

maybe, but some may heve some strong arguments (for use this and not >>>>>> that) i may overlook

I acknowledge the point, but you haven't gotten any arguments,
only opinions.

Pretty much everything about PL design is somebody's opinion.

overally when you think and discuss such thing some conclusions may do
appear - and often soem do for me, though they are not always very clear >>> or 'hard'

overally from this thread i noted that switch (which i already dont
liked) is bad.. note those two elements of switch it is "switch"
and "Case" are in weird not obvious relation in c (and what will it
work when you mix it etc)

what i concluded was than if you do thing such way

a { } //this is analogon to case - named block
b { } //this is analogon to case - named block
n() // here by "()" i noted call of some wariable that mey yeild
'call' to a ,b, c, d, e, f //(in that case na would be soem enum or
pointer)
c( ) //this is analogon to case - named block
d( ) //this is analogon to case - named block

then everything is clear - this call just selects and calls block , and
block itself are just definitions and are skipped in execution until
"called"

this is example of some conclusion for me from thsi thread - and i think >>> such codes as this my own initial example should be probably done such
way (though it is not c, i know

note in fact both array usage like tab[5] and fuunction call like foo()
are analogues to swich case - as when you call fuctions the call is
like switch and function definition sets are 'cases'

Yes, switch could be implemented via a table of label pointers, but it
needs a GNU extension.

For example this switch:

#include <stdio.h>

int main(void) {
for (int i=0; i<10; ++i) {
switch(i) {
case 7: case 2: puts("two or seven"); break;
case 5: puts("five"); break;
default: puts("other");
}
}
}

Could also be written like this:

#include <stdio.h>

int main(void) {
void* table[] = {
&&Lother, &&Lother, &&L27, &&Lother, &&Lother, &&L5,
&&Lother, &&L27, &&Lother, &&Lother};

for (int i=0; i<10; ++i) {
goto *table[i];

L27: puts("two or seven"); goto Lend;
L5: puts("five"); goto Lend;
Lother: puts("other");
Lend:;
}
}

(A compiler may generate something like this, although it will be range-checked if need. In practice, small numbers of cases, or where the
case values are too spread out, might be implemented as if-else chains.)

probably swich is implemented like

push __out__ //to simulate return under out_ adress
cmp eax, "A"
je __A__
cmp eax, "B"
je __B__
cmp eax, "C"
je __C__
__out___:
....
....
....

if elkse ladder would do the same i guess
and sequence f ifs would not push __out__ if
not detected it can ans those cases for sure may not appear

its waste to check a long sequance of compares it someones unlucky
though if the argument of switch is like 8bit wide it is probably no
problem to put the labels in the teble and callvia the table

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: i2pn2 (i2pn.org) (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Tue Nov 5 03:35:44 2024

On 03/11/2024 21:00, Bart wrote:

On 03/11/2024 17:00, David Brown wrote:

On 02/11/2024 21:44, Bart wrote:

I would disagree on that definition, yes.� A "multi-way selection"
would mean, to me, a selection of one of N possible things - nothing
more than that.� It is far too general a phrase to say that it must
involve branching of some sort ("notional" or otherwise).

Not really. If the possible options involving actions written in-line,
and you only want one of those executed, then you need to branch around
the others!

And if it does /not/ involve actions "in-line", or if the semantics of
the selection say that all parts are evaluated before the selection,
then it would /not/ involve branching. I did not say that multi-way selections cannot involve branching - I said that the phrase "multi-way selection" is too vague to say that branches are necessary.

� And it is too general to say if you are selecting one of many things
to do, or doing many things and selecting one.

Sorry, but this is the key part. You are not evaluating N things and selecting one; you are evaluating ONLY one of N things.

I understand that this is key to what /you/ mean by "multi-way
selection". And if I thought that was what that phrase meant, then I'd
agree with you on many of your other points.

If you have some objective justification for insisting that the phrase
has a particular common meaning that rules out the possibility of first creating N "things" and then selecting from them, then I would like to
hear about it. Until then, I will continue to treat it as a vague
phrase without a specific meaning, and repeating your assertions won't
change my mind.

To my mind, this is a type of "multi-way selection" :

(const int []){ a, b, c }[n];

I can't see any good reason to exclude it as fitting the descriptive
phrase. And if "a", "b" and "c" are not constant, but require
evaluation of some sort, it does not change things. Of course if these required significant effort to evaluate, or had side-effects, then you
would most likely want a "multi-way selection" construction that did the selection first, then the evaluation - but that's a matter of programmer choice, and does not change the terms. (For some situations, such as
vector processing or SIMD work, doing the calculations before the
selection may be more time-efficient even if most of the results are
then discarded.)

For X, it builds a list by evaluating all the elements, and returns the value of the last. For Y, it evaluates only ONE element (using internal switch, so branching), which again is the last.

You don't seem keen on keeping these concepts distinct?

I am very keen on keeping the concepts distinct in cases where it
matters. So they should be given distinct names or terms - or at least,
clear descriptive phrases should be used to distinguish them.

At the moment, you are saying that an "pet" is a four-legged creature
that purrs, and getting worked up when I some pets are dogs. It doesn't matter how much of a cat person you are, there /are/ other kinds of pets.

It doesn't matter how keen you are on making the selection before the evaluation, or how often it is the better choice, you can't impose
arbitrary restrictions on a general phrase.

The whole construct may or may not return a value. If it does, then
one of the N paths must be a default path.

No, that is simply incorrect.� For one thing, you can say that it is
perfectly fine for the selection construct to return a value sometimes
and not at other times.

How on earth is that going to satisfy the type system? You're saying
it's OK to have this:

�� int x = if (randomfloat()<0.5) 42;

In C, no. But when we have spread to other languages, including
hypothetical languages, there's nothing to stop that. Not only could it
be supported by the run-time type system, but it would be possible to
have compile-time types that are more flexible and only need to be "solidified" during code generation. That might allow the language to
track things like "uninitialised" or "no value" during compilation
without having them part of a real type (such as std::optional<> or a C
struct with a "valid" field). All sorts of things are possible in a programming language when you don't always think in terms of direct translation from source to assembly.

Or even this, which was discussed recently, and which is apparently
valid C:

�� int F(void) {
�� if (randomfloat()<0.5) return 42;

Presumably you meant to add the closing } here ? Yes, that is valid C,
but it is undefined behaviour to use the value of F() if a value was not returned.

In the first example, you could claim that no assignment takes place
with a false condition (so x contains garbage). In the second example,
what value does F return when the condition is false?

It doesn't return a value. That is why it is UB to try to use that non-existent value.

You can't hide behind your vast hyper-optimising compiler; the language needs to say something about it.

I am not suggesting any kind of "hyper-optimising" compiler. I am
suggesting that it is perfectly possible for a language to be defined in
a way that is different from your limited ideas (noting that your style
of language is not hugely different from C, at least in this aspect).

My language will not allow it. Most people would say that that's a good thing. You seem to want to take the perverse view that such code should
be allowed to return garbage values or have undefined behaviour.

Is your idea of "most people" based on a survey of more than one person?

Note that I have not suggested returning garbage values - I have
suggested that a language might support handling "no value" in a
convenient and safe manner. Many languages already do, though of course
it is debatable how safe, convenient or efficient the chosen solution
is. I've already given examples of std::optional<> in C++, Maybe types
in Haskell, null pointers in C, and you can add exceptions to that list
as a very different way of allowing functions to exit without returning
a value.

Totally independent of and orthogonal to that, I strongly believe that
there is no point in trying to define behaviour for something that
cannot happen, or for situations where there is no correct behaviour.
The principle of "garbage in, garbage out" was established by Babbage's
time, and the concept of functions that do not have defined values for
all inputs is as at least as old as the concept of mathematical function
- it goes back to the first person who realised you can't divide by
zero. The concept of UB is no more and no less than this.

After all, this is C! But please tell me, what would be the downside of
not allowing it?

Are you asking what are the downsides of always requiring a returned
value of a specific type? Do you mean in addition to the things I have already written?

� It's fine if it never returns at all for some

cases.� It's fine to give selection choices for all possible inputs.
It's fine to say that the input must be a value for which there is a
choice.

What I see here is that you don't like C's constructs (that may be for
good reasons, it may be from your many misunderstandings about C, or
it may be from your knee-jerk dislike of everything C related).

With justification. 0010 means 8 in C? Jesus.

I think the word "neighbour" is counter-intuitive to spell. Therefore
we should throw out the English language, because it is all terrible,
and it only still exists because some people insist on using it rather
than my own personal language of gobbledegook.

That's the knee-jerk "logic" you use in discussions about C. (Actually,
it's worse than that - you'd reject English because you think the word "neighbour" is spelt with three "u's", or because you once saw it misspelt.)

It's hardly knee-jerk either since I first looked at it in 1982, when my
own language barely existed. My opinion has not improved.

It's been knee-jerk all the time I have known you in this group.

Of course some of your criticisms of the language will be shared by
others - that's true of any language that has ever been used. And
different people will dislike different aspects of the language. But
you are unique in hating everything about C simply because it is in C.

� You have some different selection constructs in your own language,
which you /do/ like.� (It would be very strange for you to have
constructs that you don't like in your own personal one-man language.)

It's a one-man language but most of its constructs and features are universal. And therefore can be used for comparison.

Once a thread here has wandered this far off-topic, it is perhaps not unreasonable to draw comparisons with your one-man language. But it is
not as useful as comparisons to real languages that other people might
be familiar with, or can at least read a little about.

The real problem with your language is that you think it is perfect, and
that everyone else should agree that it is perfect, and that any
language that does something differently is clearly wrong and inferior.
This hinders you from thinking outside the box you have build for yourself.

One feature of my concept of 'multi-way select' is that there is one
or more controlling expressions which determine which path is followed.

Okay, that's fine for /your/ language's multi-way select construct.
But other people and other languages may do things differently.

FGS, /how/ different? To select WHICH path or which element requires
some input. That's the controlling expression.

Have you been following this thread at all? Clearly a "multi-way
select" must have an input to choose the selection. But it does /not/
have to be a choice of a path for execution or evaluation.

When someone disagrees with a statement you made, please try to think a
little about which part of it they disagree with.

Or maybe with your ideal language, you can select an element of an array without bothering to provide an index!

There are plenty of C programmers - including me - who would have
preferred to have "switch" be a more structured construct which could
not be intertwined with other constructs in this way.� That does not
mean "switch" is not clearly defined - nor does it hinder almost every
real-world use of "switch" from being reasonably clear and structured.
It does, however, /allow/ people to use "switch" in more complex and
less clear ways.

Try and write a program which takes any arbitrary switch construct (that usually means written by someone else, because obviously all yours will
be sensible), and cleanly isolates all the branches including the
default branch.

No. I am well aware that the flexibility of C's switch, and the
fall-through mechanism, make it more effort to parse and handle algorithmically than if it were more structured. That has no bearing on whether or not the meaning is clearly defined, or whether the majority
of real-world uses of "switch" are fairly easy to follow.

Hint: the lack of 'break' in a non-empty span between two case labels
will blur the line. So will a conditional break (example below unless
it's been culled).

You are confusing "this makes it possible to write messy code" with a
belief that messy code is inevitable or required.� And you are
forgetting that it is always possible to write messy or
incomprehensible code in any language, with any construct.

I can't write that randomfloat example in my language.

Okay.

I can't leave out
a 'break' in a switch statement (it's not meaningful). It is impossible
to do the crazy things you can do with switch in C.

Okay - although I can't see why you'd have a "break" statement here in
the first place.

As I've said many times, I'd prefer it if C's switches were more structured.

None of that has any bearing on other types of multi-way selection
constructs.

Yes, with most languages you can write nonsense programs, but that
doesn't give the language a licence to forget basic rules and common
sense, and just allow any old rubbish even if clearly wrong:

�� int F() {
�� F(1, 2.3, "four", F,F,F,F(),F(F()));
�� F(42);
�� }

This is apparently valid C. It is impossible to write this in my language.

It is undefined behaviour in C. Programmers are expected to write
sensible code.

I am confident that if I knew your language, I could write something meaningless. But just as with C, doing so would be pointless.

You can't use such a statement as a solid basis for a multi-way
construct that returns a value, since it is, in general, impossible
to sensibly enumerate the N branches.

It is simple and obvious to enumerate the branches in almost all
real-world cases of switch statements.� (And /please/ don't faff
around with cherry-picked examples you have found somewhere as if they
were representative of anything.)

Oh, right. I'm not allowed to use counter-examples to lend weight to my comments. In that case, perhaps you shouldn't be allowed to use your sensible examples either. After all we don't know what someone will feed
to a compiler.

We /do/ know that most people would feed sensible code to compilers.

But, suppose C was upgraded so that switch could return a value. For
that, you'd need the value at the end of each branch. OK, here's a
simple one:

�� y = switch (x) {
�� case 12:
�� if (c) case 14: break;
�� 100;
�� case 13:
�� 200;
�� break;
�� }

Any ideas? I will guess that x=12/c=false or c=13 will yield 200. What
avout x=12/c=true, or x=14, or x = anything else?

What exactly is your point here? Am I supposed to be impressed that you
can add something to C and then write meaningless code with that extension?

So if I understand correctly, you are saying that chains of if/else,
an imaginary version of "switch", and the C tertiary operator all
evaluate the same things in the same way, while with C's switch you
have no idea what happens?

Yes. With C's switch, you can't /in-general/ isolate things into
distinct blocks. You might have a stab if you stick to a subset of C and follow various guidelines, in an effort to make 'switch' look normal.

See the example above.

You /can/ isolate things into distinct blocks, with occasional
fall-throughs, when you look at code people actually write. No one
writes code like your example above, so no one needs to be able to
interpret it.

Occasionally, people use "switch" statements in C for fancy things, like coroutines. Then the logic flow can be harder to follow, but it is for
niche cases. People don't randomly mix switches with other structures.

� That is true, if you cherry-pick what you choose to ignore in each
case until it fits your pre-conceived ideas.

You're the one who's cherry-picking examples of C!

I haven't even given any examples.

Here is my attempt at
converting the above switch into my syntax (using a tool derived from my
C compiler):

�� switch x
�� when 12 then
�� if c then

�� fi
�� 100
�� fallthrough
�� when 13 then
�� 200
�� end switch

It doesn't attempt to deal with fallthrough, and misses out that
14-case, and that conditional break. It's not easy; I might have better
luck with assembly!

No, what you call "natural" is entirely subjective.� You have looked
at a microscopic fraction of code written in a tiny proportion of
programming languages within a very narrow set of programming fields.

I've worked with systems programming and have done A LOT in the 15 years until the mid 90s. That included pretty much everything involved in
writing graphical applications given only a text-based disk OS that
provided file-handling.

I know you have done a fair bit of programming. That does not change
what I said. (And I am not claiming that I have programmed in a wider
range of fields than you.)

Plus of course devising and implementing everthing needed to run my own systems language. (After mid 90s, Windows took over half the work.)

That's not criticism - few people have looked at anything more.

Very few people use their own languages, especially over such a long
period, also use them to write commercial applications, or create
languages for others to use.

When you use your own language, that inevitably /restricts/ your
experience with other programmers and other code. It is not a positive
thing in this context.

� What I /do/ criticise is that your assumption that this almost
negligible experience gives you the right to decide what is "natural"
or "true", or how programming languages or tools "should" work.

So, in your opinion, 'switch' should work how it works in C? That is the most intuitive and natural way implementing it?

No, I think there is quite a bit wrong with the way C's "switch"
statement works.

I don't think there is a single "most intuitive" or "most natural" way
to achieve a multi-way execution path selection statement in a language
- because "intuitive" and "natural" are highly subjective. There are syntaxes, features and limitations that I think would be a reasonable
fit in C, but those could well be very different in other languages.

� You need to learn that other people have different ideas, needs,
opinions or preferences.

Most people haven't got a clue about devising PLs.

I think you'd be surprised. Designing a general-purpose programming
language is not a small or easy task, and making a compiler is certainly
a big job. But you'd search far and wide to find an experienced
programmer who doesn't have opinions or ideas about languages and how
they might like to change them.

� I'd question the whole idea of having a construct that can
evaluate to something of different types in the first place, whether
or not it returns a value, but that's your choice.

If the result of a multi-way execution doesn't yield a value to be
used, then the types don't matter.

Of course they do.

Of course they don't! Here, F, G and H return int, float and void* respectively:

�� if (c1) F();
�� else if (c2) G();
�� else�� H();

C will not complain that those branches yield different types. But you
say it should do? Why?

Those branches don't yield different types in C. In C, branches don't
"yield" anything. Any results from calling these functions are, in
effect, cast to void.

You're just being contradictory for the sake of it aren't you?!

No, but I think you are having great difficulty understanding what I
write. Maybe that's my fault as much as yours.

This is just common sense; I don't know why you're questioning it.
(I'd quite like to see a language of your design!)

def foo(n) :
��if n == 1 : return 10
��if n == 2 : return 20
��if n == 3 : return

That's Python, quite happily having a multiple choice selection that
sometimes does not return a value.

Python /always/ returns some value. If one isn't provided, it returns
None. Which means checking that a function returns an explicit value
goes out the window. Delete the 10 and 20 (or the entire body), and it
still 'works'.

"None" is the Python equivalent of "no value".

Maybe you are thinking about returning an unspecified value of a type
such as "int", rather than returning no value?

� Yes, that is a dynamically typed language, not a statically type
language.

std::optional<int> foo(int n) {
�� if (n == 1) return 10;
�� if (n == 2) return 20;
�� if (n == 3) return {};
}

That's C++, a statically typed language, with a multiple choice
selection that sometimes does not return a value - the return type
supports values of type "int" and non-values.

So what happens when n is 4? Does it return garbage (so that's bad).

It is undefined behaviour, as you would expect. (In my hypothetical
language that had better handling for "no value", falling off the end of
the function would return "no value" - in C++, that's std::nullopt,
which is what you get with "return {};" here.)

Does it arrange to return some special value of 'optional' that means no value?

No. C++ rules for function returns are similar to C's, but a little
stricter - you are not allowed to fall off the end of a non-void
function (excluding main(), constructors, destructors and coroutines).
If you break the rules, there is no defined behaviour.

The "return {};" returns the special "std::nullopt;" value (converted to
the actual std::optional<T> type) that means "no value".

Roughly speaking, a C++ std::optional<T> is like a C struct:

struct�{
bool valid;
T value;
}

In that case, the type still does matter, but the language is
providing that default path for you.

X Y A B are arbitrary expressions. The need for 'else' is determined
during type analysis. Whether it will ever execute the default path
would be up to extra analysis, that I don't do, and would anyway be
done later.

But if it is not possible for neither of X or Y to be true, then how
would you test the "else" clause?� Surely you are not proposing that
programmers be required to write lines of code that will never be
executed and cannot be tested?

Why not? They still have to write 'end', or do you propose that can be
left out if control never reaches the end of the function?!

I'm guessing that "end" here is part of the syntax of your function definitions in your language. That's not executable code, but part of
the syntax.

(In earlier versions of my dynamic language, the compiler would insert
an 'else' branch if one was needed, returning 'void'.

I decided that requiring an explicit 'else' branch was better and more failsafe.)

You can't design a language like this where valid syntax depends on
compiler and what it might or might not discover when analysing the
code.

Why not?� It is entirely reasonable to say that a compiler for a
language has to be able to do certain types of analysis.

This was the first part of your example:

�const char * flag_to_text_A(bool b) {
�� if (b == true) {
�� return "It's true!";
�� } else if (b == false) {
�� return "It's false!";

/I/ would question why you'd want to make the second branch conditional
in the first place. Write an 'else' there, and the issue doesn't arise.

Perhaps I want to put it there for symmetry.

Because I can't see the point of deliberately writing code that usually takes two paths, when either:

�(1) you know that one will never be taken, or
�(2) you're not sure, but don't make any provision in case it is

Fix that first rather relying on compiler writers to take care of your
badly written code.

I am not expecting anything from compiler writers here. I am asking
/you/ why you want to force /programmers/ to write extra code that they
know is useless.

And also, you keep belittling my abilities and my language, when C allows:

� int F(void) {}

How about getting your house in order first.

If I were the designer of the C language and the maintainer of the C standards, you might have a point. C is not /my/ language.

Anyone who is convinced that their own personal preferences are more
"natural" or inherently superior to all other alternatives, and can't
justify their claims other than saying that everything else is "a
mess", is just navel-gazing.

I wrote more here but the post is already too long.

Ah, a point that we can agree on 100% :-)

Let's just that
'messy' is a fair assessment of C's conditional features, since you can write this:

No, let's not just say that.

We can agree that C /lets/ people write messy code. It does not
/require/ it. And I have never found a programming language that stops
people writing messy code.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Tue Nov 5 06:50:40 2024

On 04/11/2024 16:35, David Brown wrote:

On 03/11/2024 21:00, Bart wrote:

To my mind, this is a type of "multi-way selection" :

��(const int []){ a, b, c }[n];

I can't see any good reason to exclude it as fitting the descriptive
phrase.

And if "a", "b" and "c" are not constant, but require
evaluation of some sort, it does not change things.� Of course if these required significant effort to evaluate,

Or you had a hundred of them.

or had side-effects, then you
would most likely want a "multi-way selection" construction that did the selection first, then the evaluation - but that's a matter of programmer choice, and does not change the terms.

You still don't get how different the concepts are. Everybody is
familiar with N-way selection when it involves actions, eg. statements. Because they will be in form of a switch statement, or an if-else chain.

They will expect one branch only to evaluated. Otherwise, there's no
point in a selection or condition, if all will be evaluated anyway!

But I think they are less familiar with the concept when it mainly
involves expressions, and the whole thing returns a value.

The only such things in C are the ?: operator, and those compound
literals. And even then, those do not allow arbitrary statements.

Here is a summary of C vs my language.

In C, 0 or 1 branches will be evaluated (except for ?: where it is
always 1.)

In M, 0 or 1 branches are evaluated, unless it yields a value or lvalue,
then it must be 1 (and those need an 'else' branch):

C M

if-else branches can be exprs/stmts Y Y
if-else can yield a value N Y
if-else can be an lvalue N Y

?: branches can be exprs/stmts Y Y (M's is a form of if)
?: can yield a value Y Y
?: can be an lvalue N Y (Only possible in C++)

switch branches can have expr/stmts Y Y
switch can yield a value N Y
switch can be an lvalue N Y

select can have exprs/stmts - Y (Does not exist in C)
select can yield a value - Y
select can be an lvalue - Y

case-select has exprs/stmts - Y
case-select can yield a value - Y
case-select can be an lvalue - Y

15 Ys in the M column, vs 4 Ys in the C column, with only 1 for value-returning. You can see why C users might be less familiar with the concepts.

I am very keen on keeping the concepts distinct in cases where it
matters.

I know, you like to mix things up. I like clear lines:

func F:int ... Always returns a value
proc P ... Never returns a value

�� int x = if (randomfloat()<0.5) 42;

In C, no.� But when we have spread to other languages, including hypothetical languages, there's nothing to stop that.� Not only could it
be supported by the run-time type system, but it would be possible to
have compile-time types that are more flexible

This is a program from my 1990s scripting language which was part of my
CAD application:

n := 999
x := (n | 10, 20, 30)
println x

This uses N-way select (and evaluating only one option!). But there is
no default path (it is added by the bytecode compiler).

The output, since n is out of range, is this:

<Void>

In most arithmetic, using a void value is an error, so it's likely to
soon go wrong. I now require a default branch, as that is safer.

and only need to be
"solidified" during code generation.� That might allow the language to
track things like "uninitialised" or "no value" during compilation
without having them part of a real type (such as std::optional<> or a C

But you are always returning an actual type in agreement with the
language. That is my point. You're not choosing to just fall off that
cliff and return garbage or just crash.

However, your example with std::optional did just that, despite having
that type available.

It doesn't return a value.� That is why it is UB to try to use that non-existent value.

And why it is so easy to avoid that UB.

My language will not allow it. Most people would say that that's a
good thing. You seem to want to take the perverse view that such code
should be allowed to return garbage values or have undefined behaviour.

Is your idea of "most people" based on a survey of more than one person?

So, you're suggesting that "most people" would prefer a language that
lets you do crazy, unsafe things for no good reason? That is, unless you prefer to fall off that cliff I keep talking about.

The fact is, I spend a lot of time implementing this stuff, but I
wouldn't even know how to express some of the odd things in C. My
current C compiler uses a stack-based IL. Given this:

#include <stdio.h>

int F(void){}

int main(void) {
int a;
a=F();
printf("%d\n", a);
}

It just about works when generating native code (I'm not quite sure
how); but it just returns whatever garbage is in the register:

c:\cxp>cc -run t # here runs t.c as native code in memory
1900545

But the new compiler can also directly interpret that stack IL:

c:\cxp>cc -runp t
PC Exec error: RETF/SP mismatch: old=3 curr=2 seqno: 7

The problem is that the call-function handling expects a return value to
have been pushed. But nothing has been pushed in this case. And the
language doesn't allow me to detect that.

(My compiler could detect some cases, but not all, and even it it could,
it would report false positives of a missing return, for functions that
did always return early.)

So this is a discontinuity in the language, a schism, an exception that shouldn't be there. It's unnatural. It looked off to me, and it is off
in practice, so it's not just an opinion.

To fix this would require my always pushing some dummy value at the
closing } of the function, if the operand stack is empty at that point.

Which is sort of what you are claiming you don't want to impose on the programmer. But it looks like it's needed anyway, otherwise the function
is out of kilter.

Note that I have not suggested returning garbage values - I have
suggested that a language might support handling "no value" in a
convenient and safe manner.

But in C it is garbage. And I've show an example of my language handling
'no value' in a scheme from the 1990s; I decided to require an explicit
'else' branch, which you seem to think is some kind of imposition.

Well, it won't kill you, and it can make programs more failsafe. It is
also friendly to compilers that aren't 100MB monsters.

Totally independent of and orthogonal to that, I strongly believe that
there is no point in trying to define behaviour for something that
cannot happen,

But it could for n==4.

With justification. 0010 means 8 in C? Jesus.

I think the word "neighbour" is counter-intuitive to spell.

EVERYBODY agrees that leading zero octals in C were a terrible idea. You
can't say it's just me thinks that!

Once a thread here has wandered this far off-topic, it is perhaps not unreasonable to draw comparisons with your one-man language.

Suppose I'd made my own hammer. The things I'd use it for are not going
to that different: hammering in nails, pulling them out, or generally
bashing things about.

As I said, the things my language does are universal. The way it does
them are better thought out and tidier.

The real problem with your language is that you think it is perfect

Compared with C, it a huge improvement. Compared with most other modern languages, 95% of what people expect now is missing.

�� int F() {
�� F(1, 2.3, "four", F,F,F,F(),F(F()));
�� F(42);

It is undefined behaviour in C.� Programmers are expected to write
sensible code.

But it would be nice if the language stopped people writing such things,
yes?

Can you tell me which other current languages, other than C++ and
assembly, allow such nonsense?

None? So it's not just me and my language then! Mine is lower level and
still plenty unsafe, but it has somewhat higher standards.

If I were the designer of the C language and the maintainer of the C standards, you might have a point.� C is not /my/ language.

You do like to defend it though.

We can agree that C /lets/ people write messy code.� It does not
/require/ it.� And I have never found a programming language that stops people writing messy code.

I had included half a dozen points that made C's 'if' error prone and confusing, that would not occur in my syntax because it is better designed.

You seem to be incapable of drawing a line beween what a language can
enforce, and what a programmer is free to express.

Or rather, because a programmer has so much freedom anyway, let's not
bother with any lines at all! Just have a language that simply doesn't care.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Tue Nov 5 08:48:12 2024

On 04/11/2024 20:50, Bart wrote:

On 04/11/2024 16:35, David Brown wrote:

On 03/11/2024 21:00, Bart wrote:

To my mind, this is a type of "multi-way selection" :

��(const int []){ a, b, c }[n];

I can't see any good reason to exclude it as fitting the descriptive
phrase.

And if "a", "b" and "c" are not constant, but require evaluation of
some sort, it does not change things.� Of course if these required
significant effort to evaluate,

Or you had a hundred of them.

or had side-effects, then you would most likely want a "multi-way
selection" construction that did the selection first, then the
evaluation - but that's a matter of programmer choice, and does not
change the terms.

You still don't get how different the concepts are.

Yes, I do. I also understand how they are sometimes exactly the same
thing, depending on the language, and how they can often have the same
end result, depending on the details, and how they can often be
different, especially in the face of side-effects or efficiency concerns.

Look, it's really /very/ simple.

A) You can have a construct that says "choose one of these N things to
execute and evaluate, and return that value (if any)".

B) You can have a construct that says "here are N things, select one of
them to return as a value".

Both of these can reasonably be called "multi-way selection" constructs.
Some languages can have one as a common construct, other languages may
have the other, and many support both in some way. Pretty much any
language that allows the programmer to have control over execution order
will let you do both in some way, even if there is not a clear language construct for it and you have to write it manually in code.

Mostly type A will be most efficient if there is a lot of effort
involved in putting together the things to select. Type B is likely to
be most efficient if you already have the collection of things to choose
from (it can be as simple as an array lookup), if the creation of the collection can be done in parallel (such as in some SIMD uses), or if
the cpu can generate them all before it has established the selection index.

Sometimes type A will be the simplest and clearest in the code,
sometimes type B will be the simplest and clearest in the code.

Both of these constructs are "multi-way selections".

Your mistake is in thinking that type A is all there is and all that
matters, possibly because you feel you have a better implementation for
it than C has. (I think that you /do/ have a nicer switch than C, but
that does not justify limiting your thinking to it.)

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Tue Nov 5 09:25:32 2024

On 04/11/2024 20:50, Bart wrote:

On 04/11/2024 16:35, David Brown wrote:

On 03/11/2024 21:00, Bart wrote:

Here is a summary of C vs my language.

<snip the irrelevant stuff>

I am very keen on keeping the concepts distinct in cases where it
matters.

I know, you like to mix things up. I like clear lines:

� func F:int ...�� Always returns a value
� proc P� ...�� Never returns a value

Oh, you /know/ that, do you? And how do you "know" that? Is that
because you still think I am personally responsible for the C language,
and that I think C is the be-all and end-all of perfect languages?

I agree that it can make sense to divide different types of "function".
I disagree that whether or not a value is returned has any significant relevance. I see no difference, other than minor syntactic issues,
between "int foo(...)" and "void foo(int * result, ...)".

A much more useful distinction would be between Malcolm-functions and Malcolm-procedures. "Malcolm-functions" are "__attribute__((const))" in
gcc terms or "[[unsequenced]]" in C23 terms (don't blame me for the
names here). In other words, they have no side-effects and their
result(s) are based entirely on their inputs. "Malcolm-procedures" can
have side-effects and interact with external data. I would possibly add
to that "meta-functions" that deal with compile-time information -
reflection, types, functions, etc.

and only need to be "solidified" during code generation.� That might
allow the language to track things like "uninitialised" or "no value"
during compilation without having them part of a real type (such as
std::optional<> or a C

But you are always returning an actual type in agreement with the
language. That is my point. You're not choosing to just fall off that
cliff and return garbage or just crash.

However, your example with std::optional did just that, despite having
that type available.

It doesn't return a value.� That is why it is UB to try to use that
non-existent value.

And why it is so easy to avoid that UB.

I agree. I think C gets this wrong. That's why I, and pretty much all
other C programmers, use a subset of C that disallows falling off the
end of a function with a non-void return type. Thus we avoid that UB.

(The only reason it is acceptable syntax in C, AFAIK, is because early versions of C had "default int" everywhere - there were no "void"
functions.)

Note that I have not suggested returning garbage values - I have
suggested that a language might support handling "no value" in a
convenient and safe manner.

But in C it is garbage.

Note that /I/ have not suggested returning garbage values.

I have not said that I think C is defined in a good way here. You are,
as so often, mixing up what people say they like with what C does (or
what you /think/ C does, as you are often wrong). And as usual you mix
up people telling you what C does with what people think is a good idea
in a language.

Totally independent of and orthogonal to that, I strongly believe that
there is no point in trying to define behaviour for something that
cannot happen,

But it could for n==4.

Again, you /completely/ miss the point.

If you have a function (or construct) that returns a correct value for
inputs 1, 2 and 3, and you never pass it the value 4 (or anything else),
then there is no undefined behaviour no matter what the code looks like
for values other than 1, 2 and 3. If someone calls that function with
input 4, then /their/ code has the error - not the code that doesn't
handle an input 4.

EVERYBODY agrees that leading zero octals in C were a terrible idea. You can't say it's just me thinks that!

I agree that this a terrible idea. <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60523>

But picking one terrible idea in C does not mean /everything/ in C is a terrible idea! /That/ is what you got wrong, as you do so often.

�� int F() {
�� F(1, 2.3, "four", F,F,F,F(),F(F()));
�� F(42);

It is undefined behaviour in C.� Programmers are expected to write
sensible code.

But it would be nice if the language stopped people writing such things, yes?

Sure. That's why sane programmers use decent tools - the language might
not stop them writing this, but the tools do.

Can you tell me which other current languages, other than C++ and
assembly, allow such nonsense?

Python.

Of course, it is equally meaningless in Python as it is in C.

None? So it's not just me and my language then! Mine is lower level and still plenty unsafe, but it has somewhat higher standards.

If I were the designer of the C language and the maintainer of the C
standards, you might have a point.� C is not /my/ language.

You do like to defend it though.

I defend it if that is appropriate. Mostly, I /explain/ it to you. It
is bizarre that people need to do that for someone who claims to have
written a C compiler, but there it is.

We can agree that C /lets/ people write messy code.� It does not
/require/ it.� And I have never found a programming language that
stops people writing messy code.

I had included half a dozen points that made C's 'if' error prone and confusing, that would not occur in my syntax because it is better designed.

I'm glad you didn't - it would be a waste of effort.

You seem to be incapable of drawing a line beween what a language can enforce, and what a programmer is free to express.

I can't see how you could reach that conclusion.

Or rather, because a programmer has so much freedom anyway, let's not
bother with any lines at all! Just have a language that simply doesn't
care.

You /do/ understand that I use top-quality tools with carefully chosen warnings, set to throw fatal errors, precisely because I want a language
that has a lot more "lines" and restrictions that your little tools?
/Every/ C programmer uses a restricted subset of C - some more
restricted than others. I choose to use a very strict subset of C for
my work, because it is the best language for the tasks I need to do. (I
also use a very strict subset of C++ when it is a better choice.)

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Tue Nov 5 10:44:34 2024

On 04/11/2024 22:25, David Brown wrote:

On 04/11/2024 20:50, Bart wrote:

But it could for n==4.

Again, you /completely/ miss the point.

If you have a function (or construct) that returns a correct value for inputs 1, 2 and 3, and you never pass it the value 4 (or anything else), then there is no undefined behaviour no matter what the code looks like
for values other than 1, 2 and 3.� If someone calls that function with
input 4, then /their/ code has the error - not the code that doesn't
handle an input 4.

This is the wrong kind of thinking.

If this was a library function then, sure, you can stipulate a set of
input values, but that's at a different level, where you are writing
code on top of a working, well-specified language.

You don't make use of holes in the language, one that can cause a crash.
That is, by allowing a function to run into an internal RET op with no provision for a result. That's if there even is a RET; perhaps your
compilers are so confident that that path is not taken, or you hint it
won't be, that they won't bother!

It will start executing whatever random bytes follow the function.

As I said in my last post, a missing return value caused an internal
error in one of my C implementations because a pushed return value was missing.

How should that be fixed, via a hack in the implementation which pushes
some random value to avoid an immediate crash? And then what?

Let the user - the author of the function - explicitly provide that
value then at least that can be documented: if N isn't in 1..3, then F
returns so and so.

You know that makes perfect sense, but because you've got used to that dangerous feature in C you think it's acceptable.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Tue Nov 5 13:11:46 2024

On 04/11/2024 21:48, David Brown wrote:

On 04/11/2024 20:50, Bart wrote:

On 04/11/2024 16:35, David Brown wrote:

On 03/11/2024 21:00, Bart wrote:

To my mind, this is a type of "multi-way selection" :

��(const int []){ a, b, c }[n];

I can't see any good reason to exclude it as fitting the descriptive
phrase.

And if "a", "b" and "c" are not constant, but require evaluation of
some sort, it does not change things.� Of course if these required
significant effort to evaluate,

Or you had a hundred of them.

or had side-effects, then you would most likely want a "multi-way
selection" construction that did the selection first, then the
evaluation - but that's a matter of programmer choice, and does not
change the terms.

You still don't get how different the concepts are.

Yes, I do.� I also understand how they are sometimes exactly the same
thing, depending on the language, and how they can often have the same
end result, depending on the details, and how they can often be
different, especially in the face of side-effects or efficiency concerns.

Look, it's really /very/ simple.

A) You can have a construct that says "choose one of these N things to execute and evaluate, and return that value (if any)".

B) You can have a construct that says "here are N things, select one of
them to return as a value".

Both of these can reasonably be called "multi-way selection" constructs.
�Some languages can have one as a common construct, other languages may have the other, and many support both in some way.� Pretty much any
language that allows the programmer to have control over execution order will let you do both in some way, even if there is not a clear language construct for it and you have to write it manually in code.

Mostly type A will be most efficient if there is a lot of effort
involved in putting together the things to select.� Type B is likely to
be most efficient if you already have the collection of things to choose from (it can be as simple as an array lookup), if the creation of the collection can be done in parallel (such as in some SIMD uses), or if
the cpu can generate them all before it has established the selection
index.

Sometimes type A will be the simplest and clearest in the code,
sometimes type B will be the simplest and clearest in the code.

Both of these constructs are "multi-way selections".

Your mistake is in thinking that type A is all there is and all that matters, possibly because you feel you have a better implementation for
it than C has.� (I think that you /do/ have a nicer switch than C, but
that does not justify limiting your thinking to it.)

You STILL don't get it. Suppose this wasn't about returning a value, but executing one piece of code from a conditional set of statements.

In C that might be using an if/else chain, or switch. Other languages
might use a match statement.

Universally only one of those pieces of code will be evaluated. Unless
you can point me to a language where, in IF C THEN A ELSE B, *both* A
and B statements are executed.

Do you agree so far? If so call that Class I.

Do you also agree that languages have data stuctures, and those often
have constructors that will build a data structure element by element?
So all elements necessarily have to be evaluated. (Put aside selecting
one for now; that is a separate matter).

Call that Class II.

What my languages do, is that ALL the constructs in Class I that are
commonly used to execute one of N branches, can also return values.
(Which can require each branch to yield a type compatible with all the
others; another separate matter.)

Do you now see why it is senseless for my 'multi-way' selections to work
any other way. It would mean that:

x := if C then A else B fi

really could both evaluate A and B whatever the value of C! Whatever
that IF construct does here, has to do the same even without that 'x :='
a the start.

Of course, I support the sorts of indexing, of an existing or
just-created data structure, that belong in Class II.

Although it would not be particularly efficient to do this:

(f1(), f2(), .... f100())[100] # (1-based)

Since you will execute 100 functions rather than just one. But perhaps
there is a good reason for it. In that is needed, then the construct exists.

Another diference between Class I (when used to yield values) and Class
II, is that an out-of-bounds selector in Part II either yields a runtime
error (or raises an exception), or may just go wrong in my lower-level language.

But in Class I, the selector is either range-checked or falls off the
end of a test sequence, and a default value is provided.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Tue Nov 5 19:26:24 2024

On 05/11/2024 00:44, Bart wrote:

On 04/11/2024 22:25, David Brown wrote:

On 04/11/2024 20:50, Bart wrote:

But it could for n==4.

Again, you /completely/ miss the point.

If you have a function (or construct) that returns a correct value for
inputs 1, 2 and 3, and you never pass it the value 4 (or anything
else), then there is no undefined behaviour no matter what the code
looks like for values other than 1, 2 and 3.� If someone calls that
function with input 4, then /their/ code has the error - not the code
that doesn't handle an input 4.

This is the wrong kind of thinking.

If this was a library function then, sure, you can stipulate a set of
input values, but that's at a different level, where you are writing
code on top of a working, well-specified language.

You don't make use of holes in the language, one that can cause a crash. That is, by allowing a function to run into an internal RET op with no provision for a result. That's if there even is a RET; perhaps your compilers are so confident that that path is not taken, or you hint it
won't be, that they won't bother!

It will start executing whatever random bytes follow the function.

As I said in my last post, a missing return value caused an internal
error in one of my C implementations because a pushed return value was missing.

How should that be fixed, via a hack in the implementation which pushes
some random value to avoid an immediate crash? And then what?

Let the user - the author of the function - explicitly provide that
value then at least that can be documented: if N isn't in 1..3, then F returns so and so.

You know that makes perfect sense, but because you've got used to that dangerous feature in C you think it's acceptable.

I am a serious programmer. I write code for use by serious programmers.
I don't write code that is bigger and slower for the benefit of some half-wit coder that won't read the relevant documentation or rub a
couple of brain cells together. I have no time for hand-holding and spoon-feeding potential users of my functions - if someone wants to use play-dough plastic knives, they should not have become a programmer.

My programming stems from mathematics, not from C, and from an education
in developing provably correct code. I don't try to calculate the log
of 0, and I don't expect the mathematical log function to give me some "default" value if I try. The same applies to my code.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Tue Nov 5 23:42:34 2024

Bart <bc@freeuk.com> wrote:

Then we disagree on what 'multi-way' select might mean. I think it means branching, even if notionally, on one-of-N possible code paths.

OK.

The whole construct may or may not return a value. If it does, then one
of the N paths must be a default path.

You need to cover all input values. This is possible when there
is reasonably small number of possibilities. For example, switch on
char variable which covers all possible values does not need default
path. Default is needed only when number of possibilities is too
large to explicitely give all of them. And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From fir@3:633/280.2 to Waldek Hebisch on Wed Nov 6 00:23:04 2024

To: Waldek Hebisch <antispam@fricas.org>

Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

Then we disagree on what 'multi-way' select might mean. I think it means
branching, even if notionally, on one-of-N possible code paths.

OK.

The whole construct may or may not return a value. If it does, then one
of the N paths must be a default path.

You need to cover all input values. This is possible when there
is reasonably small number of possibilities. For example, switch on
char variable which covers all possible values does not need default
path. Default is needed only when number of possibilities is too
large to explicitely give all of them. And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.

in fact when consider in mind or see on assembly level the
implementation of switch not necessary need "default"
patch (which shopuld be named "other" btw)

it has two natural ways
1) ignore them
2) signal an runtime error

(both are kinda natural)

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: i2pn2 (i2pn.org) (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Wed Nov 6 00:29:21 2024

On 05/11/2024 13:42, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

Then we disagree on what 'multi-way' select might mean. I think it means
branching, even if notionally, on one-of-N possible code paths.

OK.

I appreciate this is what Bart means by that phrase, but I don't agree
with it. I'm not sure if that is covered by "OK" or not!

The whole construct may or may not return a value. If it does, then one
of the N paths must be a default path.

You need to cover all input values. This is possible when there
is reasonably small number of possibilities. For example, switch on
char variable which covers all possible values does not need default
path. Default is needed only when number of possibilities is too
large to explicitely give all of them. And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.

I think this is all very dependent on what you mean by "all input values".

Supposing I declare this function:

// Return the integer square root of numbers between 0 and 10
int small_int_sqrt(int x);

To me, the range of "all input values" is integers from 0 to 10. I
could implement it as :

int small_int_sqrt(int x) {
if (x == 0) return 0;
if (x < 4) return 1;
if (x < 9) return 2;
if (x < 16) return 3;
unreachable();
}

If the user asks for small_int_sqrt(-10) or small_int_sqrt(20), that's
/their/ fault and /their/ problem. I said nothing about what would
happen in those cases.

But some people seem to feel that "all input values" means every
possible value of the input types, and thus that a function like this
should return a value even when there is no correct value in and no
correct value out.

This is, IMHO, just nonsense and misunderstands the contract between
function writers and function users.

Further, I am confident that these people are quite happen to write code
like :

// Take a pointer to an array of two ints, add them, and return the sum
int sum_two_ints(const int * p) {
return p[0] + p[1];
}

Perhaps, in a mistaken belief that it makes the code "safe", they will add :

if (!p) return 0;

at the start of the function. But they will not check that "p" actually points to an array of two ints (how could they?), nor will they check
for integer overflow (and what would they do if it happened?).

A function should accept all input values - once you have made clear
what the acceptable input values can be. A "default" case is just a
short-cut for conveniently handling a wide range of valid input values -
it is never a tool for handling /invalid/ input values.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Tim Rentsch@3:633/280.2 to All on Wed Nov 6 00:50:34 2024

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 02.11.2024 19:09, Tim Rentsch wrote:

[...] As long as
the code is logically correct you are free to choose either
style, and it's perfectly okay to choose the one that you find
more appealing.

This is certainly true for one-man-shows.

The question asked concerned code in an individual programming
effort. I was addressing the question that was asked.

Hardly suited for most professional contexts I worked in.

Note that the pronoun "you" is plural as well as singular. The
conclusion applies to groups just as it does to individuals.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Tim Rentsch@3:633/280.2 to All on Wed Nov 6 01:11:18 2024

Bart <bc@freeuk.com> writes:

On 04/11/2024 04:00, Tim Rentsch wrote:

fir <fir@grunge.pl> writes:

Tim Rentsch wrote:

With the understanding that I am offering [nothing] more than my
own opinion, I can say that I might use any of the patterns
mentioned, depending on circumstances. I don't think any one
approach is either always right or always wrong.

maybe, but some may heve some strong arguments (for use this and
not that) i may overlook

I acknowledge the point, but you haven't gotten any arguments,
only opinions.

Pretty much everything about PL design is somebody's opinion.

First, the discussion is not about language design but language
usage.

Second, the idea that "pretty much everything" about language usage
is just opinion is simply wrong (that holds for language design
also). Most of what is offered in newsgroups is just opinion, but
there are plenty of objective statements that could be made also.
Posters in the newsgroup here rarely make such statements, mostly I
think because they don't want to be bothered to make the effort to
research the issues. But that doesn't mean there isn't much to say
about such things; there is plenty to say, but for some strange
reason the people posting in comp.lang.c think their opinions offer
more value than statements of objective fact.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Wed Nov 6 02:03:54 2024

On 05/11/2024 12:42, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

Then we disagree on what 'multi-way' select might mean. I think it means
branching, even if notionally, on one-of-N possible code paths.

OK.

The whole construct may or may not return a value. If it does, then one
of the N paths must be a default path.

You need to cover all input values. This is possible when there
is reasonably small number of possibilities. For example, switch on
char variable which covers all possible values does not need default
path. Default is needed only when number of possibilities is too
large to explicitely give all of them. And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.

What's easier to implement in a language: to have a conditional need for
an 'else' branch, which is dependent on the compiler performing some arbitrarily complex levels of analysis on some arbitrarily complex set
of expressions...

....or to just always require 'else', with a dummy value if necessary?

Even if you went with the first, what happens if the compiler can't
guarantee that all values of a selector are covered; should it report
that, or say nothing?

What happens if you do need 'else', but later change things so all bases
are covered; will the compiler report it as being unnecesary, so that
you remove it?

Now, C doesn't have such a feature to test out (ie. that is a construct
with an optional 'else' branch, the whole of which returns a value). The nearest is function return values:

int F(int n) {
if (n==1) return 10;
if (n==2) return 20;
}

Here, neither tcc not gcc report that you might run into the end of the function. It will return garbage if called with anything other than 1 or 2.

gcc will say something with enough warning levels (reaches end of
non-void function). But it will say the same here:

int F(unsigned char c) {
if (c<128) return 10;
if (c>=128) return 20;
}

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Wed Nov 6 03:02:04 2024

On 05/11/2024 16:03, Bart wrote:

On 05/11/2024 12:42, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

Then we disagree on what 'multi-way' select might mean. I think it means >>> branching, even if notionally, on one-of-N possible code paths.

OK.

The whole construct may or may not return a value. If it does, then one
of the N paths must be a default path.

You need to cover all input values.� This is possible when there
is reasonably small number of possibilities.� For example, switch on
char variable which covers all possible values does not need default
path.� Default is needed only when number of possibilities is too
large to explicitely give all of them.� And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.

What's easier to implement in a language: to have a conditional need for
an 'else' branch, which is dependent on the compiler performing some arbitrarily complex levels of analysis on some arbitrarily complex set
of expressions...

...or to just always require 'else', with a dummy value if necessary?

If this was a discussion on learning about compiler design for newbies,
that might be a relevant point. Otherwise, what is easier to implement
in a language tool is completely irrelevant to what is good in a language.

A language should try to support things that are good for the
/programmer/, not the compiler. But it does have to limited by what is practically possible for a compiler. A fair bit of the weaknesses of C
as a language can be attributed to the limitations of compilers from its
early days, and thereafter existing practice was hard to change.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Wed Nov 6 06:39:21 2024

David Brown <david.brown@hesbynett.no> wrote:

On 05/11/2024 13:42, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

Then we disagree on what 'multi-way' select might mean. I think it means >>> branching, even if notionally, on one-of-N possible code paths.

OK.

I appreciate this is what Bart means by that phrase, but I don't agree
with it. I'm not sure if that is covered by "OK" or not!

You may prefer your own definition, but Bart's is resonable one.

The whole construct may or may not return a value. If it does, then one
of the N paths must be a default path.

You need to cover all input values. This is possible when there
is reasonably small number of possibilities. For example, switch on
char variable which covers all possible values does not need default
path. Default is needed only when number of possibilities is too
large to explicitely give all of them. And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.

I think this is all very dependent on what you mean by "all input values".

Supposing I declare this function:

// Return the integer square root of numbers between 0 and 10
int small_int_sqrt(int x);

To me, the range of "all input values" is integers from 0 to 10. I
could implement it as :

int small_int_sqrt(int x) {
if (x == 0) return 0;
if (x < 4) return 1;
if (x < 9) return 2;
if (x < 16) return 3;
unreachable();
}

If the user asks for small_int_sqrt(-10) or small_int_sqrt(20), that's /their/ fault and /their/ problem. I said nothing about what would
happen in those cases.

But some people seem to feel that "all input values" means every
possible value of the input types, and thus that a function like this
should return a value even when there is no correct value in and no
correct value out.

Well, some languages treat types more seriously than C. In Pascal
type of your input would be 0..10 and all input values would be
handled. Sure, when domain is too complicated to express in type
than it could be documented restriction. Still, it makes sense to
signal error if value goes outside handled rage, so in a sense all
values of input type are handled: either you get valid answer or
clear error.

This is, IMHO, just nonsense and misunderstands the contract between function writers and function users.

Further, I am confident that these people are quite happen to write code like :

// Take a pointer to an array of two ints, add them, and return the sum
int sum_two_ints(const int * p) {
return p[0] + p[1];
}

I do not think that people wanting strong type checking are happy
with C. Simply, either they use different language or use C
without bitching, but aware of its limitations. I certainly would
be quite unhappy with code above. It is possible that I would still
use it as a compromise (say if it was desirable to have single
prototype but handle points in spaces of various dimensions),
but my first attempt would be something like:

typedef struct {int p[2];} two_int;
.....

Perhaps, in a mistaken belief that it makes the code "safe", they will add :

if (!p) return 0;

at the start of the function. But they will not check that "p" actually points to an array of two ints (how could they?), nor will they check
for integer overflow (and what would they do if it happened?).

I am certainly unhappy with overflow handling in current hardware
an by extention with overflow handling in C.

A function should accept all input values - once you have made clear
what the acceptable input values can be. A "default" case is just a short-cut for conveniently handling a wide range of valid input values -
it is never a tool for handling /invalid/ input values.

Well, default can signal error which frequently is right handling
of invalid input values.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Wed Nov 6 06:53:12 2024

Bart <bc@freeuk.com> wrote:

On 05/11/2024 12:42, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

Then we disagree on what 'multi-way' select might mean. I think it means >>> branching, even if notionally, on one-of-N possible code paths.

OK.

The whole construct may or may not return a value. If it does, then one
of the N paths must be a default path.

You need to cover all input values. This is possible when there
is reasonably small number of possibilities. For example, switch on
char variable which covers all possible values does not need default
path. Default is needed only when number of possibilities is too
large to explicitely give all of them. And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.

What's easier to implement in a language: to have a conditional need for
an 'else' branch, which is dependent on the compiler performing some arbitrarily complex levels of analysis on some arbitrarily complex set
of expressions...

...or to just always require 'else', with a dummy value if necessary?

Well, frequently it is easier to do bad job, than a good one. However, normally you do not need very complex analysis: if simple analysis
is not enough, then first thing to do is to simpliy the program.
And in cases where problem to solve is really hard and program can
not be simplified ("irreducible complexity"), then it is time for
cludges for example in form of default case. But it should not
be the norm.

Even if you went with the first, what happens if the compiler can't guarantee that all values of a selector are covered; should it report
that, or say nothing?

Compile time error.

What happens if you do need 'else', but later change things so all bases
are covered; will the compiler report it as being unnecesary, so that
you remove it?

When practical, yes.

Now, C doesn't have such a feature to test out (ie. that is a construct
with an optional 'else' branch, the whole of which returns a value). The nearest is function return values:

int F(int n) {
if (n==1) return 10;
if (n==2) return 20;
}

Here, neither tcc not gcc report that you might run into the end of the function. It will return garbage if called with anything other than 1 or 2.

Hmm, using gcc-12 with your code in "foo.c":

gcc -Wall -O3 -c foo.c
foo.c: In function ‘F’:
foo.c:4:1: warning: control reaches end of non-void function [-Wreturn-type]
4 | }
| ^

gcc will say something with enough warning levels (reaches end of
non-void function). But it will say the same here:

int F(unsigned char c) {
if (c<128) return 10;
if (c>=128) return 20;
}

Indeed, it says the same. Somebody should report this as a bug.
IIUC gcc has all machinery needed to detect that all cases are
covered.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Wed Nov 6 07:33:55 2024

On 05/11/2024 20:39, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

On 05/11/2024 13:42, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

Then we disagree on what 'multi-way' select might mean. I think it means >>>> branching, even if notionally, on one-of-N possible code paths.

OK.

I appreciate this is what Bart means by that phrase, but I don't agree
with it. I'm not sure if that is covered by "OK" or not!

You may prefer your own definition, but Bart's is resonable one.

The only argument I can make here is that I have not seen "multi-way
select" as a defined phrase with a particular established meaning. So
it simply means what the constituent words mean - selecting something
from multiple choices. There are no words in that phrase that talk
about "branching", or imply a specific order to events. It is a very
general and vague phrase, and I cannot see a reason to assume it has
such a specific meaning as Bart wants to assign to it. And as I have
pointed out in other posts, there are constructs in many languages
(including C) that fit the idea of a selection from one of many things,
but which do not fit Bart's specific interpretation of the phrase.

Bart's interpretation is "reasonable" in the sense of being definable
and consistent, or at least close enough to that to be useable in a discussion. But neither he, I, or anyone else gets to simply pick a
meaning for such a phrase and claim it is /the/ definition. Write a
popular and influential book with this as a key phrase, and /then/ you
can start calling your personal definition "the correct" definition.

The whole construct may or may not return a value. If it does, then one >>>> of the N paths must be a default path.

You need to cover all input values. This is possible when there
is reasonably small number of possibilities. For example, switch on
char variable which covers all possible values does not need default
path. Default is needed only when number of possibilities is too
large to explicitely give all of them. And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.

I think this is all very dependent on what you mean by "all input values". >>
Supposing I declare this function:

// Return the integer square root of numbers between 0 and 10
int small_int_sqrt(int x);

To me, the range of "all input values" is integers from 0 to 10. I
could implement it as :

int small_int_sqrt(int x) {
if (x == 0) return 0;
if (x < 4) return 1;
if (x < 9) return 2;
if (x < 16) return 3;
unreachable();
}

If the user asks for small_int_sqrt(-10) or small_int_sqrt(20), that's
/their/ fault and /their/ problem. I said nothing about what would
happen in those cases.

But some people seem to feel that "all input values" means every
possible value of the input types, and thus that a function like this
should return a value even when there is no correct value in and no
correct value out.

Well, some languages treat types more seriously than C. In Pascal
type of your input would be 0..10 and all input values would be
handled. Sure, when domain is too complicated to express in type
than it could be documented restriction. Still, it makes sense to
signal error if value goes outside handled rage, so in a sense all
values of input type are handled: either you get valid answer or
clear error.

No, it does not make sense to do that. Just because the C language does
not currently (maybe once C++ gets contracts, C will copy them) have a
way to specify input sets other than by types, does not mean that
functions in C always have a domain matching all possible combinations
of bits in the underlying representation of the parameter's types.

It might be a useful fault-finding aid temporarily to add error messages
for inputs that are invalid but can physically be squeezed into the parameters. That won't stop people making incorrect declarations of the function and passing completely different parameter types to it, or
finding other ways to break the requirements of the function.

And in general there is no way to check the validity of the inputs - you usually have no choice but to trust the caller. It's only in simple
cases, like the example above, that it would be feasible at all.

There are, of course, situations where the person calling the function
is likely to be incompetent, malicious, or both, and where there can be serious consequences for what you might prefer to consider as invalid
input values. You have that for things like OS system calls - it's no different than dealing with user inputs or data from external sources.
But you handle that by extending the function - increase the range of
valid inputs and appropriate outputs. You no longer have a function
that takes a number between 0 and 10 and returns the integer square root
- you now have a function that takes a number between -(2 ^ 31 + 1) and
(2 ^ 31) and returns the integer square root if the input is in the
range 0 to 10 or halts the program with an error message for other
inputs in the wider range. It's a different function, with a wider set
of inputs - and again, it is specified to give particular results for particular inputs.

This is, IMHO, just nonsense and misunderstands the contract between
function writers and function users.

Further, I am confident that these people are quite happen to write code
like :

// Take a pointer to an array of two ints, add them, and return the sum
int sum_two_ints(const int * p) {
return p[0] + p[1];
}

I do not think that people wanting strong type checking are happy
with C. Simply, either they use different language or use C
without bitching, but aware of its limitations.

Sure. C doesn't give as much help to writing correct programs as some
other languages. That doesn't mean the programmer can't do the right thing.

I certainly would
be quite unhappy with code above. It is possible that I would still
use it as a compromise (say if it was desirable to have single
prototype but handle points in spaces of various dimensions),
but my first attempt would be something like:

typedef struct {int p[2];} two_int;
....

I think you'd quickly find that limiting and awkward in C (but it might
be appropriate in other languages). But don't misunderstand me - I am
all in favour of finding ways in code that make input requirements
clearer or enforceable within the language - never put anything in
comments if you can do it in code. You could reasonably do this in C
for the first example :

// Do not use this directly
extern int small_int_sqrt_implementation(int x);

// Return the integer square root of numbers between 0 and 10
static inline int small_int_sqrt(int x) {
assert(x >= 0 && x <= 10);
return small_int_sqrt_implementation(x);
}

There is no way to check the validity of pointers in C, but you might at
least be able to use implementation-specific extensions to declare the function with the requirement that the pointer not be null.

Perhaps, in a mistaken belief that it makes the code "safe", they will add : >>
if (!p) return 0;

at the start of the function. But they will not check that "p" actually
points to an array of two ints (how could they?), nor will they check
for integer overflow (and what would they do if it happened?).

I am certainly unhappy with overflow handling in current hardware
an by extention with overflow handling in C.

A function should accept all input values - once you have made clear
what the acceptable input values can be. A "default" case is just a
short-cut for conveniently handling a wide range of valid input values -
it is never a tool for handling /invalid/ input values.

Well, default can signal error which frequently is right handling
of invalid input values.

Will that somehow fix the bug in the code that calls the function?

It can be a useful debugging and testing aid, certainly, but it does not
make the code "correct" or "safe" in any sense.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Wed Nov 6 09:48:28 2024

On 05/11/2024 13:29, David Brown wrote:

On 05/11/2024 13:42, Waldek Hebisch wrote:

Supposing I declare this function:

// Return the integer square root of numbers between 0 and 10
int small_int_sqrt(int x);

To me, the range of "all input values" is integers from 0 to 10.� I
could implement it as :

int small_int_sqrt(int x) {
��if (x == 0) return 0;
��if (x < 4) return 1;
��if (x < 9) return 2;
��if (x < 16) return 3;
��unreachable();
}

If the user asks for small_int_sqrt(-10) or small_int_sqrt(20), that's /their/ fault and /their/ problem.� I said nothing about what would
happen in those cases.

But some people seem to feel that "all input values" means every
possible value of the input types, and thus that a function like this
should return a value even when there is no correct value in and no
correct value out.

Your example is an improvement on your previous ones. At least it
attempts to deal with out-of-range conditions!

However there is still the question of providing that return type. If 'unreachable' is not a special language feature, then this can fail
either if the language requires the 'return' keyword, or 'unreachable'
doesn't yield a compatible type (even if it never returns because it's
an error handler).

Getting that right will satisfy both the language (if it cared more
about such matters than C apparently does), and the casual reader
curious about how the function contract is met (that is, supplying that promised return int type if or when it returns).

// Take a pointer to an array of two ints, add them, and return the sum
int sum_two_ints(const int * p) {
��return p[0] + p[1];
}

Perhaps, in a mistaken belief that it makes the code "safe", they will
add :

��if (!p) return 0;

at the start of the function.� But they will not check that "p" actually points to an array of two ints (how could they?), nor will they check
for integer overflow (and what would they do if it happened?).

This is a different category of error.

Here's a related example of what I'd class as a language error:

int a;
a = (exit(0), &a);

A type mismatch error is usually reported. However, the assignment is
never done because it never returns from that exit() call.

I expect you wouldn't think much of a compiler that didn't report such
an error because that code is never executed.

But to me that is little different from running into the end of function without the proper provisions for a valid return value.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Wed Nov 6 10:01:44 2024

On 05/11/2024 20:33, David Brown wrote:

On 05/11/2024 20:39, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

On 05/11/2024 13:42, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

Then we disagree on what 'multi-way' select might mean. I think it
means
branching, even if notionally, on one-of-N possible code paths.

OK.

I appreciate this is what Bart means by that phrase, but I don't agree
with it.� I'm not sure if that is covered by "OK" or not!

You may prefer your own definition, but Bart's is resonable one.

The only argument I can make here is that I have not seen "multi-way
select" as a defined phrase with a particular established meaning.

Well, it started off as 2-way select, meaning constructs like this:

x = c ? a : b;
x := (c | a | b)

Where one of two branches is evaluated. I extended the latter to N-way
select:

x := (n | a, b, c, ... | z)

Where again one of these elements is evaluated, selected by n (here
having the values of 1, 2, 3, ... compared with true, false above, but
there need to be at least 2 elements inside |...| to distinguish them).

I applied it also to other statements that can be provide values, such
as if-elsif chains and switch, but there the selection might be
different (eg. a series of tests are done sequentially until a true one).

I don't know how it got turned into 'multi-way'.

Notice that each starts with an assignment (or the value is used in
other ways like passing to a function), so provision has to be made for
some value always to be returned.

Such N-way selections can be emulated, for example:

if (c)
x = a;
else
x = b;

But because the assignment has been brought inside (a dedicated one for
each branch), the issue of a default path doesn't arise. You can leave
out the 'else' for example; x is just left unchanged.

This doesn't work however when the result is passed to a function:

f(if (c) a);

what is passed when c is false?

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Wed Nov 6 10:15:35 2024

On 05/11/2024 19:53, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

On 05/11/2024 12:42, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

Then we disagree on what 'multi-way' select might mean. I think it means >>>> branching, even if notionally, on one-of-N possible code paths.

OK.

The whole construct may or may not return a value. If it does, then one >>>> of the N paths must be a default path.

You need to cover all input values. This is possible when there
is reasonably small number of possibilities. For example, switch on
char variable which covers all possible values does not need default
path. Default is needed only when number of possibilities is too
large to explicitely give all of them. And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.

What's easier to implement in a language: to have a conditional need for
an 'else' branch, which is dependent on the compiler performing some
arbitrarily complex levels of analysis on some arbitrarily complex set
of expressions...

...or to just always require 'else', with a dummy value if necessary?

Well, frequently it is easier to do bad job, than a good one.

I assume that you consider the simple solution the 'bad' one?

I'd would consider a much elaborate one putting the onus on external
tools, and still having an unpredictable result to be the poor of the two.

You want to create a language that is easily compilable, no matter how
complex the input.

With the simple solution, the worst that can happen is that you have to
write a dummy 'else' branch, perhaps with a dummy zero value.

If control never reaches that point, it will never be executed (at
worse, it may need to skip an instruction).

But if the compiler is clever enough (optionally clever, it is not a requirement!), then it could eliminate that code.

A bonus is that when debugging, you can comment out all or part of the previous lines, but the 'else' now catches those untested cases.

normally you do not need very complex analysis:

I don't want to do any analysis at all! I just want a mechanical
translation as effortlessly as possible.

I don't like unbalanced code within a function because it's wrong and
can cause problems.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Kaz Kylheku@3:633/280.2 to All on Wed Nov 6 18:26:25 2024

On 2024-11-05, Bart <bc@freeuk.com> wrote:

On 05/11/2024 20:33, David Brown wrote:

On 05/11/2024 20:39, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

On 05/11/2024 13:42, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

Then we disagree on what 'multi-way' select might mean. I think it >>>>>> means
branching, even if notionally, on one-of-N possible code paths.

OK.

I appreciate this is what Bart means by that phrase, but I don't agree >>>> with it.� I'm not sure if that is covered by "OK" or not!

You may prefer your own definition, but Bart's is resonable one.

The only argument I can make here is that I have not seen "multi-way
select" as a defined phrase with a particular established meaning.

Well, it started off as 2-way select, meaning constructs like this:

x = c ? a : b;
x := (c | a | b)

Where one of two branches is evaluated. I extended the latter to N-way select:

x := (n | a, b, c, ... | z)

This looks quite error-prone. You have to count carefully that
the cases match the intended values. If an entry is
inserted, all the remaining ones shift to a higher value.

You've basically taken a case construct and auto-generated
the labels starting from 1.

If that was someone's Lisp macro, I would prefer they confine
it to their own program. :)

(defmacro nsel (expr . clauses)

^(caseql ,expr ,*[mapcar list 1 clauses]))
nsel

(nsel 1 (prinl "one") (prinl "two") (prinl "three"))

"one"
"one"

(nsel (+ 1 1) (prinl "one") (prinl "two") (prinl "three"))

"two"
"two"

(nsel (+ 1 3) (prinl "one") (prinl "two") (prinl "three"))

nil

(nsel (+ 1 2) (prinl "one") (prinl "two") (prinl "three"))

"three"
"three"
nil

(macroexpand-1 '(nsel x a b c d))

(caseql x (1 a)
(2 b) (3 c)
(4 d))

Yawn ...

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Wed Nov 6 18:38:47 2024

On 06/11/2024 00:01, Bart wrote:

On 05/11/2024 20:33, David Brown wrote:

On 05/11/2024 20:39, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

On 05/11/2024 13:42, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

Then we disagree on what 'multi-way' select might mean. I think it >>>>>> means
branching, even if notionally, on one-of-N possible code paths.

OK.

I appreciate this is what Bart means by that phrase, but I don't agree >>>> with it.� I'm not sure if that is covered by "OK" or not!

You may prefer your own definition, but Bart's is resonable one.

The only argument I can make here is that I have not seen "multi-way
select" as a defined phrase with a particular established meaning.

Well, it started off as 2-way select, meaning constructs like this:

�� x = c ? a : b;
�� x := (c | a | b)

Where one of two branches is evaluated. I extended the latter to N-way select:

�� x := (n | a, b, c, ... | z)

I appreciate that this is what you have in your language as a "multi-way select". I can see it being a potentially useful construct (though
personally I don't like the syntax at all).

The only thing I have disagreed with is your assertions that what you
have there is somehow the only "true" or "correct" concept of a
"multi-way selection".

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Wed Nov 6 21:01:16 2024

On 06/11/2024 07:26, Kaz Kylheku wrote:

On 2024-11-05, Bart <bc@freeuk.com> wrote:

Well, it started off as 2-way select, meaning constructs like this:

x = c ? a : b;
x := (c | a | b)

Where one of two branches is evaluated. I extended the latter to N-way
select:

x := (n | a, b, c, ... | z)

This looks quite error-prone. You have to count carefully that
the cases match the intended values. If an entry is
inserted, all the remaining ones shift to a higher value.

You've basically taken a case construct and auto-generated
the labels starting from 1.

It's a version of Algol68's case construct:

x := CASE n IN a, b, c OUT z ESAC

which also has the same compact form I use. I only use the compact
version because n is usually small, and it is intended to be used within
an expression: print (n | "One", "Two", "Three" | "Other").

This an actual example (from my first scripting language; not written by
me):

Crd[i].z := (BendAssen |P.x, P.y, P.z)

An out-of-bounds index yields 'void' (via a '| void' part inserted by
the compiler). This is one of my examples from that era:

xt := (messa | 1,1,1, 2,2,2, 3,3,3)
yt := (messa | 3,2,1, 3,2,1, 3,2,1)

Algol68 didn't have 'switch', but I do, as well as a separate
case...esac statement that is more general. Those are better for
multi-line constructs.

As for being error prone because values can get out of step, so is a
function call like this:

f(a, b, c, d, e)

But I also have keyword arguments.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Thu Nov 7 01:40:52 2024

On 04/11/2024 22:25, David Brown wrote:

On 04/11/2024 20:50, Bart wrote:

On 04/11/2024 16:35, David Brown wrote:

On 03/11/2024 21:00, Bart wrote:

Here is a summary of C vs my language.

<snip the irrelevant stuff>

I am very keen on keeping the concepts distinct in cases where it
matters.

I know, you like to mix things up. I like clear lines:

�� func F:int ...�� Always returns a value
�� proc P� ...�� Never returns a value

Oh, you /know/ that, do you?� And how do you "know" that?� Is that
because you still think I am personally responsible for the C language,
and that I think C is the be-all and end-all of perfect languages?

I agree that it can make sense to divide different types of "function".
I disagree that whether or not a value is returned has any significant relevance.� I see no difference, other than minor syntactic issues,
between "int foo(...)" and "void foo(int * result, ...)".

I don't use functional concepts; my functions may or may not be pure.

But the difference between value-returning and non-value returning
functions to me is significant:

Func Proc
return x; Y N
return; N Y
hit final } N Y
Pure ? Unlikely
Side-effects ? Likely
Call within expr Y N
Call standalone ? Y

Having a clear distinction helps me focus more precisely on how a
routine has to work.

In C, the syntax is dreadful: not only can you barely distinguish a
function from a procedure (even without attributes, user types and
macros add in), but you can hardly tell them apart from variable
declarations.

In fact, function declarations can even be declared in the middle of a
set of variable declarations.

You can learn a lot about the underlying structure of of a language by implementing it. So when I generate IL from C for example, I found the
need to have separate instructions to call functions and procedures, and separate return instructions too.

If you have a function (or construct) that returns a correct value for inputs 1, 2 and 3, and you never pass it the value 4 (or anything else), then there is no undefined behaviour no matter what the code looks like
for values other than 1, 2 and 3.� If someone calls that function with
input 4, then /their/ code has the error - not the code that doesn't
handle an input 4.

No. The function they are calling is badly formed. There should never be
any circumstance where a value-returning function terminates (hopefully
by running into RET) without an explicit set return value.

I agree that this a terrible idea. <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60523>

But picking one terrible idea in C does not mean /everything/ in C is a terrible idea!� /That/ is what you got wrong, as you do so often.

What the language does is generally fine. /How/ it does is generally
terrible. (Type syntax; no 'fun' keyword; = vs ==; operator precedence;
format codes; 'break' in switch; export by default; struct T vs typedef
T; dangling 'else'; optional braces; ... there's reams of this stuff!)

So actually, I'm not wrong. There have been discussions about all of
these and a lot more.

Can you tell me which other current languages, other than C++ and
assembly, allow such nonsense?

Python.

Of course, it is equally meaningless in Python as it is in C.

Python at least can trap the errors. Once you fix the unlimited
recursion, it will detect the wrong number of arguments. In C, before
C23 anyway, any number and types of arguments is legal in that example.

I defend it if that is appropriate.� Mostly, I /explain/ it to you.� It
is bizarre that people need to do that for someone who claims to have written a C compiler, but there it is.

It is bizarre that the ins and outs of C, a supposedly simple language,
are so hard to understand. Like the rules for how many {} you can leave
out for a initialising a nested data structure. Or how many extra ones
you can have; this is OK:

int a = {0};

but not {{0}} (tcc accepts it though, so which set of rules is it using?).

Or whether it is a static followed by a non-static declaration that is
OK, or whether it's the other way around.

I'm glad you didn't - it would be a waste of effort.

I guessed that. You seemingly don't care that C is a messy language with
many quirks; you just work around it by using a subset, with some help
from your compiler in enforcing that subset.

So you're using a strict dialect. The trouble is that everyone else
using C will either be using their own dialect incompatible with yours,
or are stuck using the messy language and laid-back compilers operating
in lax mode by default.

I'm interested in fixing things at source - within a language.

You /do/ understand that I use top-quality tools with carefully chosen warnings, set to throw fatal errors, precisely because I want a language that has a lot more "lines" and restrictions that your little tools?
/Every/ C programmer uses a restricted subset of C - some more
restricted than others.� I choose to use a very strict subset of C for
my work, because it is the best language for the tasks I need to do.� (I also use a very strict subset of C++ when it is a better choice.)

I'd guess only 1% of your work with C involves the actual language, and
99% using additional tooling.

With me it's mostly about the language.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Thu Nov 7 01:50:21 2024

On 05/11/2024 23:48, Bart wrote:

On 05/11/2024 13:29, David Brown wrote:

On 05/11/2024 13:42, Waldek Hebisch wrote:

Supposing I declare this function:

// Return the integer square root of numbers between 0 and 10
int small_int_sqrt(int x);

To me, the range of "all input values" is integers from 0 to 10.� I
could implement it as :

int small_int_sqrt(int x) {
��if (x == 0) return 0;
��if (x < 4) return 1;
��if (x < 9) return 2;
��if (x < 16) return 3;
��unreachable();
}

If the user asks for small_int_sqrt(-10) or small_int_sqrt(20), that's
/their/ fault and /their/ problem.� I said nothing about what would
happen in those cases.

But some people seem to feel that "all input values" means every
possible value of the input types, and thus that a function like this
should return a value even when there is no correct value in and no
correct value out.

Your example is an improvement on your previous ones. At least it
attempts to deal with out-of-range conditions!

No, it does not. The fact that some invalid inputs also give
deterministic results is a coincidence of the implementation, not an indication that the function is specified for those additional inputs or
that it does any checking. I intentionally structured the example this
way to show this - sometimes undefined behaviour gives you results you
might like, but it is still undefined behaviour. This function has no
defined behaviour for inputs outside the range 0 to 10, because I gave
no definition of its behaviour - the effect of particular
implementations of the function is irrelevant to that.

As I suspected it might, this apparently confused you.

However there is still the question of providing that return type. If 'unreachable' is not a special language feature, then this can fail
either if the language requires the 'return' keyword, or 'unreachable' doesn't yield a compatible type (even if it never returns because it's
an error handler).

"unreachable()" is a C23 standardisation of a feature found in most
high-end compilers. For gcc and clang, there is
__builtin_unreachable(), and MSVC has its version. The functions are
handled by the compilers as "undefined behaviour". (I mean that quite literally - gcc and clang turn it into an "UB" instruction in their
internal representations.)

Clearly, "unreachable()" has no return type - it does not in any sense "return". And since the compiler knows it will never be "executed", it
knows control will never fall off the end of that function. You don't
need a type for something that can never happen (it's like if I say
"this is a length of 0" and you ask "was that 0 metres, or 0 inches?" -
the question is meaningless).

Getting that right will satisfy both the language (if it cared more
about such matters than C apparently does), and the casual reader
curious about how the function contract is met (that is, supplying that promised return int type if or when it returns).

C gets it right here. There is no need for a return type when there is
no return - indeed, trying to force some sort of type or "default" value
would be counterproductive. It would be confusing to the reader, add untestable and unexecutable source code, make code flow more
complicated, break invariants, cripple correctness analysis of the rest
of the code, and make the generated object code inefficient.

Remember how the function is specified. All you have to do is use it correctly - go outside the specifications, and I make no promises or guarantees about what will happen. If you step outside /your/ side of
the bargain by giving it an input outside 0 to 10, then I give you no
more guarantees that it will return an int of any sort than I give you a guarantee that it would be a great sales gimmick if printed on a t-shirt.

But what I /can/ give you is something that can be very useful in being
sure the rest of your code is correct, and which is impossible for a
function with "default" values or other such irrelevant parts. I can guarantee you that:

int y = small_int_sqrt(x);

assert(y * y <= x);
assert ((y + 1) * (y + 1) > x);

That is to say - I can guarantee that the function works and gives you
the correct results.

But supposing I had replaced the "unreachable();" with a return of a
default value - let's say 42, since that's the right answer even if you
don't know the question. What does the user of small_int_sqrt() know now?

Now you know that "y" is an int. You have no idea if it is a correct or useful result, unless you have first checked that x is in the specified
range of 0 to 10.

If you /have/ checked (in some way) that x is valid, then why would you
bother calling the function when x is invalid? And thus why would you
care what the function does or does not do when x is invalid?

And if you haven't checked that x is valid, why would you bother calling
the function if you have no idea whether or not it results in something
useful and correct?

So we have now established that returning a default int value is worse
than useless - there are no circumstances in which it can be helpful,
and it ruins the guarantees you want in order to be sure that the
calling code is correct.

Let's now look at another alternative - have the function check for
validity, and return some kind of error signal if the input is invalid.
There are two ways to do this - we can have a value of the main return
type acting as an error signal, or we can have an additional return value.

If we pick the first one - say, return -1 on error - then we have a
compact solution that is easy to check for the calling function. But
now we have a check for validity of the input whether we need it or not
(since the callee function does the checking, even if the caller
function knows the values are valid), and the caller function has to add
a check a check for error return values. The return may still be an
"int", but it is no longer representative of an integer value - it
multiplexes two different concepts. We have lost the critical
correctness equations that we previously had. And it won't work at all
if there is no choice of an error indicator.

If we pick the second one, we need to return two values. The checking
is all the same, but at least the concepts of validity and value are separated. Now we have either a struct return with its added efficiency costs, or a monstrosity from the dark ages where the function must take
a pointer parameter for where to store the results. (And how is the
function going to check the validity of that pointer? Or is it somehow
okay to skip that check while insisting that a check of the other input
is vital?) This has most of the disadvantages of the first choice, plus
extra efficiency costs.

All in all, we have a significant costs in various aspects, with no real benefit, all in the name of a mistaken belief that we are avoiding
undefined behaviour.

// Take a pointer to an array of two ints, add them, and return the sum
int sum_two_ints(const int * p) {
��return p[0] + p[1];
}

Perhaps, in a mistaken belief that it makes the code "safe", they will
add :

��if (!p) return 0;

at the start of the function.� But they will not check that "p"
actually points to an array of two ints (how could they?), nor will
they check for integer overflow (and what would they do if it happened?).

This is a different category of error.

No, it is not. It is just another case of a function having
preconditions on the input, and whether or not the called function
should check those preconditions. You can say you think it is vital for functions to do these checks itself, or you can accept that it is the responsibility of the calling code to provide valid inputs. But you
don't get to say it is vital to check /some/ types of inputs, but other
types are fine to take on trust.

Here's a related example of what I'd class as a language error:

�� int a;
�� a = (exit(0), &a);

A type mismatch error is usually reported. However, the assignment is
never done because it never returns from that exit() call.

I expect you wouldn't think much of a compiler that didn't report such
an error because that code is never executed.

I would expect the compiler to know that "exit()" can't return, so the
value of "a" is never used and it can be optimised away. But I do also
expect that the compiler will enforce the rules of the language - syntax
and grammar rules, along with constraints and anything else it is able
to check. And even if I said it was reasonable for a language to say
this "assignment" is not an error since it can't be executed, I think
trying to put that level of detail into a language definition (and corresponding compilers) would quickly be a major complexity for no
real-world gain.

But to me that is little different from running into the end of function without the proper provisions for a valid return value.

Yes, I think so too.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Thu Nov 7 02:47:50 2024

On 06/11/2024 15:40, Bart wrote:

On 04/11/2024 22:25, David Brown wrote:

On 04/11/2024 20:50, Bart wrote:

On 04/11/2024 16:35, David Brown wrote:

On 03/11/2024 21:00, Bart wrote:

Here is a summary of C vs my language.

<snip the irrelevant stuff>

I am very keen on keeping the concepts distinct in cases where it
matters.

I know, you like to mix things up. I like clear lines:

�� func F:int ...�� Always returns a value
�� proc P� ...�� Never returns a value

Oh, you /know/ that, do you?� And how do you "know" that?� Is that
because you still think I am personally responsible for the C
language, and that I think C is the be-all and end-all of perfect
languages?

I agree that it can make sense to divide different types of
"function". I disagree that whether or not a value is returned has any
significant relevance.� I see no difference, other than minor
syntactic issues, between "int foo(...)" and "void foo(int * result,
...)".

I don't use functional concepts; my functions may or may not be pure.

OK. You are not alone in that. (Standard C didn't support a difference
there until C23.)

But the difference between value-returning and non-value returning
functions to me is significant:

�� Func� Proc
return x;�� Y�� N
return;�� N�� Y
hit final }�� N�� Y
Pure�� ?�� Unlikely
Side-effects�� ?�� Likely
Call within expr� Y�� N
Call standalone�� ?�� Y

There are irrelevant differences in syntax, which could easily disappear entirely if a language supported a default initialisation value when a
return gives no explicit value. (i.e., "T foo() { return; }; T x =
foo();" could be treated in the same way as "T x;" in a static
initialisation context.) /Your/ language does not support that, but
other languages could.

Then you list some things that may or may not happen, which are of
course totally irrelevant. If you list the differences between bikes
and cars, you don't include "some cars are red" and "bikes are unlikely
to be blue".

Having a clear distinction helps me focus more precisely on how a
routine has to work.

It's a pointless distinction. Any function or procedure can be morphed
into the other form without any difference in the semantic meaning of
the code, requiring just a bit of re-arrangement at the caller site:

int foo(int x) { int y = ...; return y; }

void foo(int * res, int x) { int y = ...; *res = y; }

void foo(int x) { ... ; return; }

int foo(int x) { ... ; return 0; }

There is no relevance in the division here, which is why most languages
don't make a distinction unless they do so simply for syntactic reasons.

In C, the syntax is dreadful: not only can you barely distinguish a
function from a procedure (even without attributes, user types and
macros add in), but you can hardly tell them apart from variable declarations.

As always, you are trying to make your limited ideas of programming
languages appear to be correct, universal, obvious or "natural" by
saying things that you think are flaws in C. That's not how a
discussion works, and it is not a way to convince anyone of anything.
The fact that C does not have a keyword used in the declaration or
definition of a function does not in any way mean that there is the
slightest point in your artificial split between "func" and "proc"
functions.

(It doesn't matter that I too prefer a clear keyword for defining
functions in a language.)

In fact, function declarations can even be declared in the middle of a
set of variable declarations.

You can learn a lot about the underlying structure of of a language by implementing it. So when I generate IL from C for example, I found the
need to have separate instructions to call functions and procedures, and separate return instructions too.

That is solely from your choice of an IL.

If you have a function (or construct) that returns a correct value for
inputs 1, 2 and 3, and you never pass it the value 4 (or anything
else), then there is no undefined behaviour no matter what the code
looks like for values other than 1, 2 and 3.� If someone calls that
function with input 4, then /their/ code has the error - not the code
that doesn't handle an input 4.

No. The function they are calling is badly formed. There should never be
any circumstance where a value-returning function terminates (hopefully
by running into RET) without an explicit set return value.

There are no circumstances where you can use the function correctly and
it does not return the correct answer. If you want to consider when
people to use a function /incorrectly/, then there are no limits to how
wrong they can be.

I agree that this a terrible idea.
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60523>

But picking one terrible idea in C does not mean /everything/ in C is
a terrible idea!� /That/ is what you got wrong, as you do so often.

What the language does is generally fine. /How/ it does is generally terrible. (Type syntax; no 'fun' keyword; = vs ==; operator precedence; format codes; 'break' in switch; export by default; struct T vs typedef
T; dangling 'else'; optional braces; ... there's reams of this stuff!)

Making the same mistake again does not help your argument.

So actually, I'm not wrong. There have been discussions about all of
these and a lot more.

Of course you are wrong!

You have failed to grasp the key concept of programming - it is based on contracts and agreements. Tasks are broken down into subtasks, and for
each subtask there is a requirement for what gets put into the subtask
and a requirement for what comes out of it. The calling task is
responsible for fulfilling the input requirements, the callee subtask is responsible for fulfilling the output requirements. The caller does not
need to check that the outputs are correct, and the callee does not need
to check that the input tasks are correct. That is the division of responsibilities - and doing anything else is, at best, wasted duplicate effort.

You are right that C has its flaws - every language does. I agree with
you in many cases where you think C has poor design choices.

But can you not understand that repeating things that you dislike about
C - things we have all heard countless times - does not excuse your
tunnel vision about programming concepts or change your misunderstandings?

Can you tell me which other current languages, other than C++ and
assembly, allow such nonsense?

Python.

Of course, it is equally meaningless in Python as it is in C.

Python at least can trap the errors. Once you fix the unlimited
recursion, it will detect the wrong number of arguments. In C, before
C23 anyway, any number and types of arguments is legal in that example.

It is syntactically legal, but semantically undefined behaviour (look it
up in the C standards). That means it is wrong, but the language
standards don't insist that compilers diagnose it as an error.

I defend it if that is appropriate.� Mostly, I /explain/ it to you.
It is bizarre that people need to do that for someone who claims to
have written a C compiler, but there it is.

It is bizarre that the ins and outs of C, a supposedly simple language,
are so hard to understand.

Have you ever played Go ? It is a game with very simple rules, and extraordinarily complicated gameplay.

Compared to most general purpose languages, C /is/ small and simple.
But that is a relative rating, not an absolute rating.

I'm glad you didn't - it would be a waste of effort.

I guessed that. You seemingly don't care that C is a messy language with many quirks; you just work around it by using a subset, with some help
from your compiler in enforcing that subset.

Yes.

If there was an alternative language that I thought would be better for
the tasks I have, I'd use that. (Actually, a subset of C++ is often
better, so I use that when I can.)

What do you think I should do instead? Whine in newsgroups to people
that don't write language standards (for C or anything else) and don't
make compilers? Make my own personal language that is useless to
everyone else and holds my customers to ransom by being the only person
that can work with their code? Perhaps that is fine for the type of
customers you have, but not for my customers.

I /do/ understand that C has its flaws (from /my/ viewpoint, for /my/
needs). So I work around those.

So you're using a strict dialect. The trouble is that everyone else
using C will either be using their own dialect incompatible with yours,
or are stuck using the messy language and laid-back compilers operating
in lax mode by default.

I'm interested in fixing things at source - within a language.

You haven't fixed a thing.

(I'm not claiming /I/ have fixed anything either.)

You /do/ understand that I use top-quality tools with carefully chosen
warnings, set to throw fatal errors, precisely because I want a
language that has a lot more "lines" and restrictions that your little
tools? /Every/ C programmer uses a restricted subset of C - some more
restricted than others.� I choose to use a very strict subset of C for
my work, because it is the best language for the tasks I need to do.
(I also use a very strict subset of C++ when it is a better choice.)

I'd guess only 1% of your work with C involves the actual language, and
99% using additional tooling.

What a weird thing to guess.

With me it's mostly about the language.

An even weirder thing to say from someone who made his own tools.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Thu Nov 7 06:38:09 2024

On 06/11/2024 15:47, David Brown wrote:

On 06/11/2024 15:40, Bart wrote:

There are irrelevant differences in syntax, which could easily disappear entirely if a language supported a default initialisation value when a return gives no explicit value.� (i.e., "T foo() { return; }; T x =
foo();" could be treated in the same way as "T x;" in a static initialisation context.)

You wrote:

T foo () {return;} # definition?

T x = foo(); # call?

I'm not quite sure what you're saying here. That a missing return value
in non-void function would default to all-zeros?

Maybe. A rather pointless feature just to avoid writing '0', and which
now introduces a new opportunity for a silent error (accidentally
forgetting a return value).

It's not quite the same as a static initialisiation, which is zeroed
when a program starts.

Then you list some things that may or may not happen, which are of
course totally irrelevant.� If you list the differences between bikes
and cars, you don't include "some cars are red" and "bikes are unlikely
to be blue".

Yes; if you're using a vehicle, or planning a journey or any related
thing, it helps to remember if it's a bike or a car! At least here you acknowledge the difference.

But I guess you find those likely/unlikely macros of gcc pointless too.
If I know something is a procedure, then I also know it is likely to
change global state, that I might need to deal with a return value, and
a bunch of other stuff.

Boldly separating the two with either FUNC or PROC denotations I find
helps tremendously. YM-obviously-V, but you can't have a go at me for my
view.

If I really found it a waste of time, the distinction would have been
dropped decades ago.

It's a pointless distinction.� Any function or procedure can be morphed
into the other form without any difference in the semantic meaning of
the code, requiring just a bit of re-arrangement at the caller site:

��int foo(int x) { int y = ...; return y; }

��void foo(int * res, int x) { int y = ...; *res = y; }

��void foo(int x) { ... ; return; }

��int foo(int x) { ... ; return 0; }

There is no relevance in the division here, which is why most languages don't make a distinction unless they do so simply for syntactic reasons.

As I said, you like to mix things up. You disagreed. I'm not surprised.

Here you've demonstrated how a function that returns results by value
can be turned into a procedure that returns a result by reference.

So now, by-value and by-reference are the same thing?

I listed seven practical points of difference between functions and procedures, and above is an eighth point, but you just dismissing them.
Is there any point in this?

I do like taking what some think as a single feature and having
dedicated versions, because I find it helpful.

That includes functions, loops, control flow and selections.

In C, the syntax is dreadful: not only can you barely distinguish a
function from a procedure (even without attributes, user types and
macros add in), but you can hardly tell them apart from variable
declarations.

As always, you are trying to make your limited ideas of programming languages appear to be correct, universal, obvious or "natural" by
saying things that you think are flaws in C.� That's not how a
discussion works, and it is not a way to convince anyone of anything.
The fact that C does not have a keyword used in the declaration or definition of a function does not in any way mean that there is the slightest point in your artificial split between "func" and "proc" functions.

void F();
void (*G);
void *H();
void (*I)();

OK, 4 things declared here. Are they procedures, functions, variables,
or pointers to functions? (I avoided using a typedef in place of 'void'
to make things easier.)

I /think/ they are as follows: procedure, pointer variable, function (returning void*), and pointer to a procedure. But I had to work at it,
even though the examples are very simple.

I don't know about you, but I prefer syntax like this:

proc F
ref void G
ref proc H
func I -> ref void

Now come on, scream at again for prefering a nice syntax for
programming, one which just tells me at a glance what it means without
having to work it out.

(It doesn't matter that I too prefer a clear keyword for defining
functions in a language.)

Why? Don't your smart tools tell you all that anyway?

That is solely from your choice of an IL.

The IL design also falls into place from the natural way these things
have to work.

Of course you are wrong!

You keep saying that. But then you also keep saying, from time to time,
that you agree that something in C was a bad idea. So I'm still wrong
when calling out the same thing?

If there was an alternative language that I thought would be better for
the tasks I have, I'd use that.� (Actually, a subset of C++ is often
better, so I use that when I can.)

What do you think I should do instead?� Whine in newsgroups to people
that don't write language standards (for C or anything else) and don't
make compilers?

What makes you think I'm whining? The thread opened up a discussion
about multi-way selections, and it got into how it could be done with
features from other languages.

I gave some examples from mine, as I'm very familiar with that, and it
uses simple features that are easy to grasp and appreciate. You could
have done the same from ones you know.

But you just hate the idea that I have my own language to draw on, whose syntax is very sweet ('serious' languages hate such syntax for some
reason, and is usually relegated to scripting languages.)

I guess then you just have to belittle and insult me, my languages and
my views at every opporunity.

Make my own personal language that is useless to
everyone else and holds my customers to ransom by being the only person
that can work with their code?

Plenty of companies use DSLs. But isn't that sort of what you do? That
is, using 'C' with a particular interpretation or enforcement of the
rules, which needs to go in hand with a particular compiler, version,
sets of options and assorted makefiles.

I for one would never be able to build one of your programs. It might as
well be written in your inhouse language with proprietory tools.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Thu Nov 7 23:23:04 2024

On 06/11/2024 14:50, David Brown wrote:

On 05/11/2024 23:48, Bart wrote:

On 05/11/2024 13:29, David Brown wrote:

int small_int_sqrt(int x) {
��if (x == 0) return 0;
��if (x < 4) return 1;
��if (x < 9) return 2;
��if (x < 16) return 3;
��unreachable();
}

"unreachable()" is a C23 standardisation of a feature found in most
high-end compilers.� For gcc and clang, there is
__builtin_unreachable(), and MSVC has its version.

So it's a kludge. Cool, I can create one of those too:

func smallsqrt(int x)int =
if
elsif x=0 then 0
elsif x<4 then 1
elsif x<9 then 2
elsif x<16 then 3
dummyelse int.min
fi
end

'dummyelse' is a special version of 'else' that tells the compiler that control will (should) never arrive there. ATM it does nothing but inform
the reader of that and to remind the author. But later stages of the
compiler can choose not to generate code for it, or to generate error-reporting code.

(A couple of things about this: the first 'if' condition and branch can
be omitted; it starts at elsif. This removes the special-casing for the
first of an if-elsif-chain, so to allow easier maintenance, and better alignment.

Second is that, unlike your C, the whole if-fi construct is a single expression term that yields the function return value. Hence the need
for all branches to be present and balanced regarding their common type.

This could have been handled internally (compiler adds 'dummyelse <empty
value for type>'), but I think it's better that it is explicit (user
might forget to add that branch).

That int.main is something I sometimes use for in-band signalling. Here
that is the value -9223372036854775808 so it's quite a wide band!
Actually it is out-of-band it the user expects only result with an i32
range.

BTW your example lets through negative values; I haven't fixed that.)

Getting that right will satisfy both the language (if it cared more
about such matters than C apparently does), and the casual reader
curious about how the function contract is met (that is, supplying
that promised return int type if or when it returns).

C gets it right here.� There is no need for a return type when there is
no return

There is no return for only half the function! A function with a return
type is a function that CAN return. If it can't ever return, then make
it a procedure.

Take this function where N can never be zero; is this the right way to
write it in C:

int F(int N) {
if (N==0) unreachable();
return abc/N; // abc is a global with value 100
}

If doesn't look right. If I compile it with gcc (using
__builtin_unreachable), and call F(0), then it crashes. So it doesn't do
much does it?!

indeed, trying to force some sort of type or "default" value
would be counterproductive.� It would be confusing to the reader, > add untestable and unexecutable source code,

But it IS confusing, since it quite clearly IS reachable. There's a
difference between covering all possible values of N, so that is
genuinely is unreachable, and having code that COULD be reachable.

Let's now look at another alternative - have the function check for validity, and return some kind of error signal if the input is invalid. There are two ways to do this - we can have a value of the main return
type acting as an error signal, or we can have an additional return value.

....

All in all, we have a significant costs in various aspects, with no real benefit, all in the name of a mistaken belief that we are avoiding
undefined behaviour.

This is all a large and complex subject. But it's not really the point
of the discussion.

I'm not talking about what happens when running a program, but what
happens at compilation, and satisfying the needs of the language.

C here is less strict in being happy to have parts of a function body as
no-go areas where various requirements can be ignored, like a function
with a designed return type T, being allowed to return without
satisfying that need.

Here, you demostrated bolted-on hacks that are not part of the language,
like the snappy __builtin_unreachable (the () are apparently optional).
I can't see however that it does much.

It is a fact C as a language allows this:

T F() {} // T is not void

(I've had to qualify T - point number 9 in procedures vs. function.)

All that C says is that control flow running into that closing },
without encountering a 'return x', is UB.

IMV, sloppy. My language simply doesn't allow it.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Fri Nov 8 02:08:34 2024

On 07/11/2024 12:23, Bart wrote:

On 06/11/2024 14:50, David Brown wrote:

C gets it right here.� There is no need for a return type when there
is no return

There is no return for only half the function! A function with a return
type is a function that CAN return. If it can't ever return, then make
it a procedure.

Take this function where N can never be zero; is this the right way to
write it in C:

�� int F(int N) {
�� if (N==0) unreachable();
�� return abc/N;�� // abc is a global with value 100
�� }

If doesn't look right. If I compile it with gcc (using __builtin_unreachable), and call F(0), then it crashes. So it doesn't do much does it?!

It looks like it needs 'else' here. If I put that in, then F(0) returns
either 0 or 1, so it returns garbage, whether or not 'unreachable' is
used in the branch.

So I'm struggling to see the point of it. Is it just to quieten the
'reaches end of non-void function' warning when used before the final '}'?

In any case, 'unreachable' is a misnomer. 'shouldnt_be_reachable' is
more accurate.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Fri Nov 8 03:23:54 2024

On 06/11/2024 20:38, Bart wrote:

On 06/11/2024 15:47, David Brown wrote:

On 06/11/2024 15:40, Bart wrote:

There are irrelevant differences in syntax, which could easily
disappear entirely if a language supported a default initialisation
value when a return gives no explicit value.� (i.e., "T foo() {
return; }; T x = foo();" could be treated in the same way as "T x;" in
a static initialisation context.)

You wrote:

� T foo () {return;}�� # definition?

� T x = foo();�� # call?

I'm not quite sure what you're saying here. That a missing return value
in non-void function would default to all-zeros?

It would not necessarily mean all zeros, but yes, that's the idea. You
could easily say that returning from a non-void function without an
explicit value, or falling off the end of it, returned the default value
for the type in the same sense as you can have a default initialisation
of non-stack objects in a language. (In C, this pretty much always
means all zeros - in a more advanced language with object support, it
would typically mean default construction.)

Equally, you could say that in a void function, "return x;" simply casts
"x" to void - just like writing "x;" as a statement does.

I'm not suggesting that either of these things are a particularly good
idea - I am merely saying that with a minor syntactic change to the
language (your language, C, or anything similar) most of the rest of the differences between your "proc" and your "func" disappear.

All you are left with is that "func" can be used in an expression, and
"proc" cannot. For me, that is not sufficient reason to distinguish
them as concepts.

Maybe. A rather pointless feature just to avoid writing '0', and which
now introduces a new opportunity for a silent error (accidentally
forgetting a return value).

Sure. As I say, I don't think it is a particularly good idea - at
least, not as an addition to C (or, presumably, your language).

It's not quite the same as a static initialisiation, which is zeroed
when a program starts.

Of course. (Theoretically in C, pointers are initialised to null
pointers which don't have to be all zeros. But I don't know of any implementation which has something different.) I was just using that to
show how some languages - like C - have a default value available.

Then you list some things that may or may not happen, which are of
course totally irrelevant.� If you list the differences between bikes
and cars, you don't include "some cars are red" and "bikes are
unlikely to be blue".

Yes; if you're using a vehicle, or planning a journey or any related
thing, it helps to remember if it's a bike or a car! At least here you acknowledge the difference.

There's a difference between cars and bikes - not between procs and funcs.

Remember, if you are going to make such a distinction between two
concepts, it has to be absolute - "likely" or "unlikely" does not help.
You can't distinguish between your procs and funcs by looking at the
existence of side-effects, since a code block that has side-effects
might return a value or might not. It's like looking at a vehicle and
seeing that it is red - it won't tell you if it is a bike or a car.

This is why I say distinguishing between "func" and "proc" by your
criteria - the existence or absence of a return type - gives no useful information to the programmer or the compiler that can't be equally well
given by writing a return type of "void".

But I guess you find those likely/unlikely macros of gcc pointless too.

How is that even remotely relevant to the discussion? (Not that gcc has macros by those names.)

If I know something is a procedure, then I also know it is likely to
change global state, that I might need to deal with a return value, and
a bunch of other stuff.

That's useless information - both to the programmer, and to the
compiler. (I am never sure which viewpoint you are taking - it would be helpful if you were clear there.) If the compiler /knows/ global state
cannot be changed, and the function only uses data from its input
values, then it can do a lot with that information - shuffling around
calls, removing duplicates, pre-calculating constant data at compile
time, or whatever. Similarly, if the programmer /knows/ global state
cannot be changed in a function, then that can make it easier to
understand what is going on in the code, or what is going wrong in it.

But if you only know that it is /likely/ to be one thing or the other,
you know nothing of use.

Boldly separating the two with either FUNC or PROC denotations I find
helps tremendously. YM-obviously-V, but you can't have a go at me for my view.

I can have a go at you for not thinking! I believe that if you think
more carefully about this, you will understand how little your
distinction helps anyone. You might find the distinction I made -
between being allowed to interact with global state (a "procedure") and
having no possibility of interacting with global state (a "function") -
to be useful. In my distinction, there is no grey area of "likely" or "unlikely" - it is absolute, and therefore gives potentially useful information. Of course it is then up to you to decide if it is worth
the effort or not.

Let me tempt you with this - whatever syntax or terms you use here,
you'll be able to brag that it is nicer than C23's "[[unsequenced]]"
attribute for pure functions!

If I really found it a waste of time, the distinction would have been dropped decades ago.

Why? Once you've put it in the language, there is no motivation to drop
it. Pascal has the same procedure / function distinction you do. Just because it adds little of use to language, does not mean that you'd want
to drop it and make your tools incompatible between language versions.

It's a pointless distinction.� Any function or procedure can be
morphed into the other form without any difference in the semantic
meaning of the code, requiring just a bit of re-arrangement at the
caller site:

��int foo(int x) { int y = ...; return y; }

��void foo(int * res, int x) { int y = ...; *res = y; }

��void foo(int x) { ... ; return; }

��int foo(int x) { ... ; return 0; }

There is no relevance in the division here, which is why most
languages don't make a distinction unless they do so simply for
syntactic reasons.

As I said, you like to mix things up. You disagreed. I'm not surprised.

Here you've demonstrated how a function that returns results by value
can be turned into a procedure that returns a result by reference.

So now, by-value and by-reference are the same thing?

Returning something from a function by returning a value, or by having
the caller pass a pointer (or mutable reference, if you prefer that
term) and having the function pass its results via that pointer are not
really very different. Sure, there are details of the syntax and the
ABI that will differ, but not the meaning of the code.

Remember that this is precisely what C compilers do when returning a
struct that is too big to fit neatly in a register or two - the caller
makes space for the return struct on the stack and passes a pointer to
it as a hidden parameter to the function. The function has no normal
return value. And yet the struct return is syntactically and
semantically identical whether it is returned in registers or via a
hidden pointer.

I listed seven practical points of difference between functions and procedures, and above is an eighth point, but you just dismissing them.
Is there any point in this?

Maybe not, if you can't understand /why/ I am dismissing them. The only difference you listed that is real and has potential consequences for
people using the language is that functions returning a value can be
used in expressions - all the rest is minor detail or wishy-washy "maybes".

I do like taking what some think as a single feature and having
dedicated versions, because I find it helpful.

That includes functions, loops, control flow and selections.

If it ultimately comes down to just the word you want to use, then I
guess that's fair enough. It is the /reasoning/ you gave that I am
arguing with.

If your language has "do ... until" and "do ... while" loops, and you
justify it by saying you simply find it nicer to write some tests as
positives and some tests as negatives, then I think that is reasonable.
If you claim it is because they are fundamentally distinct and do
different things because one is likely to loop more than three times and
the other is unlikely to do so, then I'd argue against that claim.

In C, the syntax is dreadful: not only can you barely distinguish a
function from a procedure (even without attributes, user types and
macros add in), but you can hardly tell them apart from variable
declarations.

As always, you are trying to make your limited ideas of programming
languages appear to be correct, universal, obvious or "natural" by
saying things that you think are flaws in C.� That's not how a
discussion works, and it is not a way to convince anyone of anything.
The fact that C does not have a keyword used in the declaration or
definition of a function does not in any way mean that there is the
slightest point in your artificial split between "func" and "proc"
functions.

� void F();
� void (*G);
� void *H();
� void (*I)();

OK, 4 things declared here. Are they procedures, functions, variables,
or pointers to functions? (I avoided using a typedef in place of 'void'
to make things easier.)

I /think/ they are as follows: procedure, pointer variable, function (returning void*), and pointer to a procedure. But I had to work at it,
even though the examples are very simple.

I don't know about you, but I prefer syntax like this:

�� proc F
�� ref void G
�� ref proc H
�� func I -> ref void

Now come on, scream at again for prefering a nice syntax for
programming, one which just tells me at a glance what it means without having to work it out.

I quite agree that your syntax is clearer that the example in C for this
kind of thing. I rarely see the C syntax as complicated - for my own
code - because I use typedefs and spacing that makes it clear. But I
fully agree that it is clearer in a language if it distinguishes better between declarations of variables and declarations of functions.

However, I don't think it would make a huge difference to the clarity of
your syntax if you had written :

func F -> void
ref void G
ref func H -> void
func I -> ref void

or

func F
ref void G
ref func H
func I -> ref void

It is not the use of a keyword for functions that I disagree with, nor
am I arguing for C's syntax or against your use of "ref" or ordering. I simply don't think there is much to be gained by using "proc F" instead
of "func F -> void" (assuming that's the right syntax) - or just "func F".

But I think there is quite a bit to be gained if the func/proc
distinction told us something useful and new, rather than just the
existence or lack of a return type.

(It doesn't matter that I too prefer a clear keyword for defining
functions in a language.)

Why? Don't your smart tools tell you all that anyway?

Yes, they can. But it would be nicer with a keyword. Where possible, I prefer clear language constructs /and/ nice syntax highlighting and
indexing from tools. Call me greedy if you like!

That is solely from your choice of an IL.

The IL design also falls into place from the natural way these things
have to work.

Of course you are wrong!

You keep saying that. But then you also keep saying, from time to time,
that you agree that something in C was a bad idea. So I'm still wrong
when calling out the same thing?

I can agree with you about some of the things you say about C, while
still disagreeing with other things (about C or programming in general).

If there was an alternative language that I thought would be better
for the tasks I have, I'd use that.� (Actually, a subset of C++ is
often better, so I use that when I can.)

What do you think I should do instead?� Whine in newsgroups to people
that don't write language standards (for C or anything else) and don't
make compilers?

What makes you think I'm whining? The thread opened up a discussion
about multi-way selections, and it got into how it could be done with features from other languages.

You /do/ whine a lot. But here I was asking, rhetorically, if you
thought that was a good alternative to finding ways to make C work well
for me.

I gave some examples from mine, as I'm very familiar with that, and it
uses simple features that are easy to grasp and appreciate. You could
have done the same from ones you know.

But you just hate the idea that I have my own language to draw on, whose syntax is very sweet ('serious' languages hate such syntax for some
reason, and is usually relegated to scripting languages.)

I guess then you just have to belittle and insult me, my languages and
my views at every opporunity.

I haven't been berating or belittling your language here - I have been
arguing against some of the justification you have for some design
decisions, and suggesting something that I think would be better.

Make my own personal language that is useless to everyone else and
holds my customers to ransom by being the only person that can work
with their code?

Plenty of companies use DSLs. But isn't that sort of what you do? That
is, using 'C' with a particular interpretation or enforcement of the
rules, which needs to go in hand with a particular compiler, version,
sets of options and assorted makefiles.

No.

I for one would never be able to build one of your programs. It might as well be written in your inhouse language with proprietory tools.

Pretty much every professional in my field could manage it. But
software development is a wide discipline, with many niche areas.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Fri Nov 8 04:51:09 2024

On 07/11/2024 16:23, David Brown wrote:

On 06/11/2024 20:38, Bart wrote:

[Functions vs. procedures]

�� void F();
�� void (*G);
�� void *H();
�� void (*I)();

OK, 4 things declared here. Are they procedures, functions, variables,
or pointers to functions? (I avoided using a typedef in place of
'void' to make things easier.)

I /think/ they are as follows: procedure, pointer variable, function
(returning void*), and pointer to a procedure. But I had to work at
it, even though the examples are very simple.

I don't know about you, but I prefer syntax like this:

�� proc F
�� ref void G
�� ref proc H
�� func I -> ref void

(The last two might be wrong interpretations of the C. I've stared at
the C code for a minute and I'm still not sure.

If I put it through my C compiler and examine the ST listing, it seems I
I'd just swapped the last two:

func H -> ref void
ref proc I

But you shouldn't need to employ a tool to figure out if a declaration
is even a function, let alone whether it is also a procedure. That
syntax is not fit for purpose. This is a HLL, so let's have some some HL syntax, not gobbledygook.)

It is not the use of a keyword for functions that I disagree with, nor
am I arguing for C's syntax or against your use of "ref" or ordering.� I simply don't think there is much to be gained by using "proc F" instead
of "func F -> void" (assuming that's the right syntax) - or just "func F".

But I think there is quite a bit to be gained if the func/proc
distinction told us something useful and new, rather than just the
existence or lack of a return type.

I use the same syntax for my dynamic language where type annotations are
not used, including indicating a return type for a function. That means
that without distinct keywords here:

func F =
end

proc G =
end

I can't tell whether each a returns value or not. So 'func'/'proc' is
useful to me, to readers, and makes it possible to detect errors and omissions:

- 'return' without a value in functions
- 'return x' used in procedures
- A missing return or missing return value in functions (since this
is also expression-based and the "return" keyword is optional in the
last statement/expression)
- A missing 'else' clause of multi-way constructs within functions
- Trying to use the value of a function call when that is not a
function.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Fri Nov 8 19:45:28 2024

On 07/11/2024 13:23, Bart wrote:

On 06/11/2024 14:50, David Brown wrote:

On 05/11/2024 23:48, Bart wrote:

On 05/11/2024 13:29, David Brown wrote:

int small_int_sqrt(int x) {
��if (x == 0) return 0;
��if (x < 4) return 1;
��if (x < 9) return 2;
��if (x < 16) return 3;
��unreachable();
}

"unreachable()" is a C23 standardisation of a feature found in most
high-end compilers.� For gcc and clang, there is
__builtin_unreachable(), and MSVC has its version.

So it's a kludge.

You mean it is something you don't understand? Think of this as an opportunity to learn something new.

Cool, I can create one of those too:

�func smallsqrt(int x)int =
�� if
�� elsif x=0 then� 0
�� elsif x<4 then� 1
�� elsif x<9 then� 2
�� elsif x<16 then 3
�� dummyelse�� int.min
�� fi
�end

'dummyelse' is a special version of 'else' that tells the compiler that control will (should) never arrive there. ATM it does nothing but inform
the reader of that and to remind the author. But later stages of the compiler can choose not to generate code for it, or to generate error-reporting code.

You are missing the point - that is shown clearly by the "int.min".

Do you /really/ not understand when and why it can be useful to tell the compiler that something cannot happen?

BTW your example lets through negative values; I haven't fixed that.)

Again, you are missing the point.

This is all a large and complex subject. But it's not really the point
of the discussion.

You haven't followed the discussion or considered it to have a point.
To you, the "point" of /all/ discussions here is that you hate
everything about C, think that everyone else loves everything about C,
and see it as your job to prove them "wrong".

You have your way of doing things, and have no interest in learning
anything else or even bothering to listen or think. Your bizarre hatred
of C is overpowering for you - it doesn't matter what anyone writes.
All that matters to you is how you can use it as an excuse to fit it
into your world-view that everything about C, and everything written in
C, is terrible. You don't even appear to care about your own languages
beyond the fact that they are not C.

It is time to give up for now.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Sat Nov 9 04:37:20 2024

On 08/11/2024 08:45, David Brown wrote:

On 07/11/2024 13:23, Bart wrote:

On 06/11/2024 14:50, David Brown wrote:

On 05/11/2024 23:48, Bart wrote:

On 05/11/2024 13:29, David Brown wrote:

int small_int_sqrt(int x) {
��if (x == 0) return 0;
��if (x < 4) return 1;
��if (x < 9) return 2;
��if (x < 16) return 3;
��unreachable();
}

"unreachable()" is a C23 standardisation of a feature found in most
high-end compilers.� For gcc and clang, there is
__builtin_unreachable(), and MSVC has its version.

So it's a kludge.

You mean it is something you don't understand?� Think of this as an opportunity to learn something new.

You don't seem to understand a 'kludge' is. Think of it as a 'hack',
something bolted-on to a language.

This is from Hacker News about 'unreachable':

"Note that gcc and clang's __builtin_unreachable() are optimization
pragmas, not assertions. If control actually reaches a __builtin_unreachable(), your program doesn't necessarily abort.

Terrible things can happen such as switch statements jumping into random addresses or functions running off the end without returning:"

"Sure, these aren't for defensive programming—they're for places where
you know a location is unreachable, but your compiler can't prove it for
you."

'dummyelse' is a special version of 'else' that tells the compiler
that control will (should) never arrive there. ATM it does nothing but
inform the reader of that and to remind the author. But later stages
of the compiler can choose not to generate code for it, or to generate
error-reporting code.

You are missing the point - that is shown clearly by the "int.min".

At least my code will never 'run off the end of a function'.

But, it looks like you're happy with ensuring C programs don't do that,
by the proven expedient of keeping your fingers crossed.

You have your way of doing things, and have no interest in learning
anything else or even bothering to listen or think.

Ditto for you.

� Your bizarre hatred
of C is overpowering for you

Ditto for your hatred of my stuff.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Sat Nov 9 04:37:26 2024

On 03.11.2024 18:00, David Brown wrote:

On 02/11/2024 21:44, Bart wrote:

(Note that the '|' is my example is not 'or'; it means 'then':

( c | a ) # these are exactly equivalent
if c then a fi

( c | a | ) # so are these
if c then a else b fi

There is no restriction on what a and b are, statements or
expressions, unless the whole returns some value.)

Ah, so your language has a disastrous choice of syntax here so that
sometimes "a | b" means "or", and sometimes it means "then" or
"implies", and sometimes it means "else".

(I can't comment on the "other use" of the same syntax in the
"poster's language" since it's not quoted here.)

But it's not uncommon in programming languages that operators
are context specific, and may mean different things depending
on context.

You are saying "disastrous choice of syntax". - Wow! Hard stuff.
I suggest to cool down before continuing reading further. :-)

Incidentally above syntax is what Algol 68 supports; you have
the choice to write conditionals with 'if' or with parenthesis.
As opposed to "C", where you have also *two* conditionals, one
for statements (if-then-else) and one for expressions ( ? : ),
in Algol 68 you can use both forms (sort of) "anywhere", e.g.
IF a THEN b ELSE c FI
x := IF a THEN b ELSE c FI
IF a THEN b ELSE c FI := x
or using the respective alternative forms with ( a | b | c) ,
or ( a | b ) where no 'ELSE' is required. (And there's also
the 'ELIF' and the '|:' as alternative form available.)

BTW, the same symbols can also be used as an alternative form
of the 'case' statement; the semantic distinction is made by
context, e.g. the types involved in the construct.
I can understand if this sounds strange and feels unfamiliar.

Why have a second syntax with
a confusing choice of operators when you have a perfectly good "if /
then / else" syntax?

Because, depending on the program context, that may not be as
legible as the other, simpler construct.

Personally I use both forms depending on application context.
In some cases one syntax is better legible, in other cases the
other one.[*]

In complex expressions it may even be worthwhile to mix(!) both
forms; use 'if' on outer levels and parenthesis on inner levels.
(Try an example and see before too quickly dismiss the thought.)

Or if you feel an operator adds a lot to the
language here, why not choose one that would make sense to people, such
as "=>" - the common mathematical symbol for "implies".

This is as opinion of course arguable. It's certainly also
influenced where one is coming from (i.e. personal expertise
from other languages). The detail of what symbols are used is
not that important to me, if it fits to the overall language
design.

From the high-level languages I used in my life I was almost
always "missing" something with conditional expressions. I
don't want separate and restricted syntaxes (plural!) in "C"
(for statements and expressions respectively), for example.
Some are lacking conditional expressions completely. Others
support the syntax with a 'fi' end-terminator and simplify
structures (and add to maintainability) supporting 'else-if'.
And few allow 'if' expressions on the left-hand side of an
assignment. (Algol 68 happens to support everything I need.
Unfortunately it's a language I never used professionally.)

I'm positive that folks who use languages that support those
syntactic forms wouldn't like to miss them. (Me for sure.)

("disastrous syntax" - I'm still laughing... :-)

Bart, out of interest; have you invented that syntax for your
language yourself of borrowed it from another language (like
Algol 68)?

Janis

[*] BTW, in Unix shell I also use the '||' and '&&' syntax
shortcuts occasionally, in addition to the if/then/else/fi
constructs, depending on the application context.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Sat Nov 9 04:52:01 2024

On 06.11.2024 00:01, Bart wrote:

Well, it started off as 2-way select, meaning constructs like this:

x = c ? a : b;
x := (c | a | b)

Where one of two branches is evaluated. I extended the latter to N-way select:

x := (n | a, b, c, ... | z)

Where again one of these elements is evaluated, selected by n (here
having the values of 1, 2, 3, ... compared with true, false above, but
there need to be at least 2 elements inside |...| to distinguish them).

I suppose you borrowed that syntax from Algol 68, or is that just
coincidence?

Algol 68's 'CASE' statement has the abbreviated form you depicted
above. (There's also some nesting supported with the '|:' operator,
similar to the 'IF' syntax [in Algol 68].) - Personally, though,
I use that only very rarely because of the restriction to support
only integral numbers as branch selector.

I applied it also to other statements that can be provide values, such
as if-elsif chains and switch, but there the selection might be
different (eg. a series of tests are done sequentially until a true one).

I don't know how it got turned into 'multi-way'.

[...]

Janis

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Sat Nov 9 04:53:52 2024

On 06.11.2024 11:01, Bart wrote:

x := (n | a, b, c, ... | z)

It's a version of Algol68's case construct:

x := CASE n IN a, b, c OUT z ESAC

which also has the same compact form I use. I only use the compact
version because n is usually small, and it is intended to be used within
an expression: print (n | "One", "Two", "Three" | "Other").

Which answers my upthread raised questions. :-)

Thanks.

Janis

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Sat Nov 9 05:03:54 2024

On 03/11/2024 17:00, David Brown wrote:

On 02/11/2024 21:44, Bart wrote:

(Note that the '|' is my example is not 'or'; it means 'then':

�� (� c |�� a )�� # these are exactly equivalent
�� if c then a fi

�� (� c |�� a |�� b )�� # so are these [fixed]
�� if c then a else b fi

There is no restriction on what a and b are, statements or
expressions, unless the whole returns some value.)

Ah, so your language has a disastrous choice of syntax here so that sometimes "a | b" means "or", and sometimes it means "then" or
"implies", and sometimes it means "else".

I missed this part of a very long post until JP commented on it.

As I mentioned above, "|" here doesn't mean 'or' at all. In "( ... | ...
| ... )", the first means "then" and the second "else". (It also wasn't
my idea, it was taken from Algol 68.)

Why have a second syntax with
a confusing choice of operators when you have a perfectly good "if /
then / else" syntax?

if/then/else suits multi-line statements. (||) suits terms that are part
of a larger one-line expression.

I might as well ask why C uses ?: when it has if-else, or why it needs

m when it has (*P).m.

Or if you feel an operator adds a lot to the
language here, why not choose one that would make sense to people, such
as "=>" - the common mathematical symbol for "implies".

It is not an operator, it is part of '(x | x,x,x | x)' syntax.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Sat Nov 9 05:18:26 2024

On 08/11/2024 18:37, Janis Papanagnou wrote:

On 03.11.2024 18:00, David Brown wrote:

On 02/11/2024 21:44, Bart wrote:

(Note that the '|' is my example is not 'or'; it means 'then':

( c | a ) # these are exactly equivalent
if c then a fi

( c | a | ) # so are these
if c then a else b fi

There is no restriction on what a and b are, statements or
expressions, unless the whole returns some value.)

Ah, so your language has a disastrous choice of syntax here so that
sometimes "a | b" means "or", and sometimes it means "then" or
"implies", and sometimes it means "else".

(I can't comment on the "other use" of the same syntax in the
"poster's language" since it's not quoted here.)

But it's not uncommon in programming languages that operators
are context specific, and may mean different things depending
on context.

Sure. Just look at the comma for an overloaded syntax in many languages.

You are saying "disastrous choice of syntax". - Wow! Hard stuff.
I suggest to cool down before continuing reading further. :-)

The | operator means "or" in the OP's language (AFAIK - only he actually
knows the language). So "(a | b | c)" in that language will sometimes
mean the same as "(a | b | c)" in C, and sometimes it will mean the same
as "(a ? b : c)" in C.

There may be some clear distinguishing feature that disambiguates these
uses. But this is a one-man language - there is no need for a clear
syntax or grammar, documentation, consistency in the language, or a consideration for corner cases or unusual uses.

Incidentally above syntax is what Algol 68 supports;

Yes, he said later that Algol 68 was the inspiration for it. Algol 68
was very successful in its day - but there are good reasons why many of
its design choices were been left behind long ago in newer languages.

Or if you feel an operator adds a lot to the
language here, why not choose one that would make sense to people, such
as "=>" - the common mathematical symbol for "implies".

This is as opinion of course arguable. It's certainly also
influenced where one is coming from (i.e. personal expertise
from other languages).

The language here is "mathematics". I would not expect anyone who even considers designing a programming language to be unfamiliar with that
symbol.

The detail of what symbols are used is
not that important to me, if it fits to the overall language
design.

I am quite happy with the same symbol being used for very different
meanings in different contexts. C's use of "*" for indirection and for multiplication is rarely confusing. Using | for "bitwise or" and also
using it for a "pipe" operator would probably be fine - only one
operation makes sense for the types involved. But here the two
operations - "bitwise or" (or logical or) and "choice" can apply to to
the same types of operands. That's what makes it a very poor choice of syntax.

(For comparison, Algol 68 uses "OR", "∨" or "\/" for the "or" operator,
thus it does not have this confusion.)

From the high-level languages I used in my life I was almost
always "missing" something with conditional expressions. I
don't want separate and restricted syntaxes (plural!) in "C"
(for statements and expressions respectively), for example.
Some are lacking conditional expressions completely. Others
support the syntax with a 'fi' end-terminator and simplify
structures (and add to maintainability) supporting 'else-if'.
And few allow 'if' expressions on the left-hand side of an
assignment. (Algol 68 happens to support everything I need.
Unfortunately it's a language I never used professionally.)

I'm positive that folks who use languages that support those
syntactic forms wouldn't like to miss them. (Me for sure.)

I've nothing (much) against the operation - it's the choice of operator
that is wrong.

("disastrous syntax" - I'm still laughing... :-)

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Sat Nov 9 09:24:44 2024

On 08/11/2024 17:37, Janis Papanagnou wrote:

On 03.11.2024 18:00, David Brown wrote:

or using the respective alternative forms with ( a | b | c) ,
or ( a | b ) where no 'ELSE' is required. (And there's also
the 'ELIF' and the '|:' as alternative form available.)

BTW, the same symbols can also be used as an alternative form
of the 'case' statement; the semantic distinction is made by
context, e.g. the types involved in the construct.

You mean whether the 'a' in '(a | b... | c)' has type Bool rather than Int?

I've always discriminated on the number of terms between the two |s:
either 1, or more than 1.

It would be uncommon to select one-of-N when N is only 1! It does make
for an untidy exception in the language, but which has never bothered me
(I don't think I've even thought about it until now.)

Bart, out of interest; have you invented that syntax for your
language yourself of borrowed it from another language (like
Algol 68)?

It was heavily inspired by the syntax (not the semantics) of Algol68,
even though I'd never used it at that point.

I like that it solved the annoying begin-end aspect of Algol60/Pascal
syntax where you have to write the clunky:

if cond then begin s1; s2 end else begin s3; s4 end;

You see it also with braces:

if (cond) {s1; s2; } else { s3; s4; }

With Algol68 it became:

IF cond THEN s1; s2 ELSE s3; s4 FI;

I enhanced it by not needing stropping (and so not allowing embedded
spaces within names); allowing redundant semicolons while at the same
time, turning newlines into semicolons when a line obviously didn't
continue; plus allowing ordinary 'end' or 'end if' to be used as well as
'fi'.

My version then can look like this, a bit less forbidding than Algol68:

if cond then
s1
s2
else
s3
s4
end

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Sat Nov 9 14:57:05 2024

On 08.11.2024 23:24, Bart wrote:

On 08/11/2024 17:37, Janis Papanagnou wrote:

BTW, the same symbols can also be used as an alternative form
of the 'case' statement; the semantic distinction is made by
context, e.g. the types involved in the construct.

You mean whether the 'a' in '(a | b... | c)' has type Bool rather than Int?

I've always discriminated on the number of terms between the two |s:
either 1, or more than 1.

I suppose in a [historic] "C" like language it's impossible to
distinguish on type here (given that there was no 'bool' type
[in former times] in "C"). - But I'm not quite sure whether
you're speaking here about your "C"-like language or some other
language you implemented.

Bart, out of interest; have you invented that syntax for your
language yourself of borrowed it from another language (like
Algol 68)?

It was heavily inspired by the syntax (not the semantics) of Algol68,

(Sure.)

even though I'd never used it at that point.

I like that it solved the annoying begin-end aspect of Algol60/Pascal
syntax where you have to write the clunky:
[snip examples]

Well, annoying would be a strong word [for me] here, but yes,
that's what I also find suboptimal. Quite some languages have
adopted the if/fi or if/end forms (and for good reasons, IMO).

I enhanced it by not needing stropping (and so not allowing embedded
spaces within names); allowing redundant semicolons while at the same
time, turning newlines into semicolons when a line obviously didn't
continue; plus allowing ordinary 'end' or 'end if' to be used as well as 'fi'.

My version then can look like this, a bit less forbidding than Algol68:

if cond then
s1
s2
else
s3
s4
end

(Looks a lot more like a scripting language without semicolons.)

Not sure what you mean by "less forbidding", though. - Algol 68
never appeared to me to restrict me. And it allows more flexible
and coherent application of its concepts (and in a safe way) than
in a lot other common languages.

Janis

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Sat Nov 9 15:51:54 2024

On 08.11.2024 19:18, David Brown wrote:

On 08/11/2024 18:37, Janis Papanagnou wrote:

The | operator means "or" in the OP's language (AFAIK - only he actually knows the language). So "(a | b | c)" in that language will sometimes
mean the same as "(a | b | c)" in C, and sometimes it will mean the same
as "(a ? b : c)" in C.

As said ("I can't comment on the "other use" of the same syntax"),
I don't know Bart's language, so cannot comment on that.

And, frankly, some personal language projects are not of interest
to me, apart from experiences the implementer (Bart) has gotten
from his projects that might be worthwhile to consider for other
languages' evolution or design. This is why I got interested in
the thread and his posts.

There may be some clear distinguishing feature that disambiguates these
uses. But this is a one-man language - there is no need for a clear
syntax or grammar, documentation, consistency in the language, or a consideration for corner cases or unusual uses.

Incidentally above syntax is what Algol 68 supports;

Yes, he said later that Algol 68 was the inspiration for it. Algol 68
was very successful in its day - but there are good reasons why many of
its design choices were been left behind long ago in newer languages.

Myself I've never seen Algol 68 code outside of education and
specification. (But that's normal due my naturally restricted
view on what happens all over the world. So if you have some
examples for practical successes of Algol 68 I'd be interested
to hear about.)

Some design decisions of Algol 68 are arguable, indeed, and we
can observe that from the reports those days. (But that's not
surprising given that there have been a lot of different (and
strong) characters, university professors and scientists from
all over the world, in the committees and working group.) It's
obvious that quite some members left and introduced their own
languages; those languages were of course also not unopposed.

I don't think, though, that this natural segregation process or
any design decisions of some later developed languages would
give evidence for a clear negative valuation of any specific
language details (or for the language as a whole). Contrary,
a lot of later languages even ignored outstanding and important
concepts of languages these days. (The market and politics have
their own logic and dynamics.)

This is as opinion of course arguable. It's certainly also
influenced where one is coming from (i.e. personal expertise
from other languages).

The language here is "mathematics". I would not expect anyone who even considers designing a programming language to be unfamiliar with that
symbol.

Mathematics, unfortunately, [too] often has several symbols for
the same thing. (It's in that respect not very different from
programming languages, where you can [somewhat] rely on + - * /
but beyond that it's getting more tight.)

Programming languages have the additional problem that you don't
have all necessary symbols available, so language designers have
to map them onto existing symbols. (Also Unicode in modern times
do not solve that fact, since languages typically rely on ASCII,
or some 8-bit extension, at most; full Unicode support, I think,
is rare, especially on the lexical language level. Some allow
them in strings, some in identifiers; but in language keywords?)

BTW, in Algol 68 you can define operators, so you can define
"OP V" or "OP ^" (for 'or' and 'and', respectively, but we cannot
define (e.g.) "OP �" (a middle dot, e.g. for multiplication).[*]

The detail of what symbols are used is
not that important to me, if it fits to the overall language
design.

I am quite happy with the same symbol being used for very different
meanings in different contexts. C's use of "*" for indirection and for multiplication is rarely confusing. Using | for "bitwise or" and also
using it for a "pipe" operator would probably be fine - only one
operation makes sense for the types involved. But here the two
operations - "bitwise or" (or logical or) and "choice" can apply to to
the same types of operands. That's what makes it a very poor choice of syntax.

Well, I'm more used (from mathematics) to 'v' and '^' than to '|'
and '&', respectively. But that doesn't prevent me from accepting
other symbols like '|' to have some [mathematical] meaning, or
even different meanings depending on context. In mathematics it's
not different; same symbols are used in different contexts with
different semantics. (And there's also the mentioned problem of
non-coherent literature WRT used mathematics' symbols.)

(For comparison, Algol 68 uses "OR", "∨" or "\/" for the "or" operator, thus it does not have this confusion.)

Actually, while I like Algol 68's flexibility, there's in some
cases (to my liking) too many variants. This had partly been
necessary, of course, due to the (even more) restricted character
sets (e.g. 6-bit characters) available in the 1960's.

The two options for conditionals I consider very useful, though,
and it also produces very legible and easily understandable code.

[...]

I've nothing (much) against the operation - it's the choice of operator
that is wrong.

Well, on opinions there's nothing more to discuss, I suppose.

Janis

[*] Note: I'm using the "Genie" compiler for tests.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Sat Nov 9 17:00:07 2024

On 03.11.2024 21:00, Bart wrote:

This was the first part of your example:

const char * flag_to_text_A(bool b) {
if (b == true) {
return "It's true!";
} else if (b == false) {
return "It's false!";

/I/ would question why you'd want to make the second branch conditional
in the first place.

You might want to read about Dijkstra's Guards; it might provide
some answers, rationales, and insights for this question. (Don't
get repelled or confused by the "calculate all conditions" aspect
or the non-determinism; think more about, e.g., the safety of full specification, automated optimization runs, and other [positive]
implications.)

(Though if you're only focused on programmer-optimized structures
Dijkstra's concept and ideas probably won't help you.)

Incidentally, Dijkstra's Guards cover also an aspect of the OP's
original question.

Janis

Write an 'else' there, and the issue doesn't arise.

Because I can't see the point of deliberately writing code that usually
takes two paths, when either:

(1) you know that one will never be taken, or
(2) you're not sure, but don't make any provision in case it is

Fix that first rather relying on compiler writers to take care of your
badly written code.
[...]

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Sat Nov 9 17:54:44 2024

On 04.11.2024 23:25, David Brown wrote:

If you have a function (or construct) that returns a correct value for
inputs 1, 2 and 3, and you never pass it the value 4 (or anything else),
then there is no undefined behaviour no matter what the code looks like
for values other than 1, 2 and 3. If someone calls that function with
input 4, then /their/ code has the error - not the code that doesn't
handle an input 4.

Well, it's a software system design decision whether you want to
make the caller test the preconditions for every function call,
or let the callee take care of unexpected input, or both.

We had always followed the convention to avoid all undefined
situations and always define every 'else' case by some sensible
behavior, at least writing a notice into a log-file, but also
to "fix" the runtime situation to be able to continue operating.
(Note, I was mainly writing server-side software where this was
especially important.)

That's one reason why (as elsethread mentioned) I dislike 'else'
to handle a defined value; I prefer an explicit 'if' and use the
else for reporting unexpected situations (that practically never
appear, or, with the diagnostics QA-evaluated, asymptotically
disappearing).

(For pure binary predicates there's no errors branch, of course.)

Janis

PS: One of my favorite IT-gotchas is the plane crash where the
code specified landing procedure functions for height < 50.0 ft
and for height > 50.0 ft conditions, which mostly worked since
the height got polled only every couple seconds, and the case
height = 50.0 ft happened only very rarely due to the typical
descent characteristics during landing.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From fir@3:633/280.2 to Bart on Sat Nov 9 21:08:05 2024

To: Bart <bc@freeuk.com>

Bart wrote:

On 06/11/2024 07:26, Kaz Kylheku wrote:

On 2024-11-05, Bart <bc@freeuk.com> wrote:

Well, it started off as 2-way select, meaning constructs like this:

x = c ? a : b;
x := (c | a | b)

Where one of two branches is evaluated. I extended the latter to N-way
select:

x := (n | a, b, c, ... | z)

This looks quite error-prone. You have to count carefully that
the cases match the intended values. If an entry is
inserted, all the remaining ones shift to a higher value.

You've basically taken a case construct and auto-generated
the labels starting from 1.

It's a version of Algol68's case construct:

x := CASE n IN a, b, c OUT z ESAC

which also has the same compact form I use. I only use the compact
version because n is usually small, and it is intended to be used within
an expression: print (n | "One", "Two", "Three" | "Other").

This an actual example (from my first scripting language; not written by
me):

Crd[i].z := (BendAssen |P.x, P.y, P.z)

An out-of-bounds index yields 'void' (via a '| void' part inserted by
the compiler). This is one of my examples from that era:

xt := (messa | 1,1,1, 2,2,2, 3,3,3)
yt := (messa | 3,2,1, 3,2,1, 3,2,1)

still the more c compatimle version would look better imo

xt = {1,1,1, 2,2,2, 3,3,3}[messa];
yt = {3,2,1, 3,2,1, 3,2,1}[messa];

esp if maybe there would be allowed to also use [] leftside

and

t = {1,3, 1,2, 1,1 2,3, 2,2, 2,1, 3,3, 3,2, 3,1} [messa]

where t is struct {x,y}

could be maybe faster

Algol68 didn't have 'switch', but I do, as well as a separate
case...esac statement that is more general. Those are better for
multi-line constructs.

As for being error prone because values can get out of step, so is a
function call like this:

f(a, b, c, d, e)

But I also have keyword arguments.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: i2pn2 (i2pn.org) (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Sat Nov 9 22:06:21 2024

On 09/11/2024 07:54, Janis Papanagnou wrote:

On 04.11.2024 23:25, David Brown wrote:

If you have a function (or construct) that returns a correct value for
inputs 1, 2 and 3, and you never pass it the value 4 (or anything else),
then there is no undefined behaviour no matter what the code looks like
for values other than 1, 2 and 3. If someone calls that function with
input 4, then /their/ code has the error - not the code that doesn't
handle an input 4.

Well, it's a software system design decision whether you want to
make the caller test the preconditions for every function call,
or let the callee take care of unexpected input, or both.

Well, I suppose it is their decision - they can do the right thing, or
the wrong thing, or both.

I believe I explained in previous posts why it is the /caller's/ responsibility to ensure pre-conditions are fulfilled, and why anything
else is simply guaranteeing extra overheads while giving you less
information for checking code correctness. But I realise that could
have been lost in the mass of posts, so I can go through it again if you
want.

(On security boundaries, system call interfaces, etc., where the caller
could be malicious or incompetent in a way that damages something other
than their own program, you have to treat all inputs as dangerous and
sanitize them, just like data from external sources. That's a different matter, and not the real focus here.)

We had always followed the convention to avoid all undefined
situations and always define every 'else' case by some sensible
behavior, at least writing a notice into a log-file, but also
to "fix" the runtime situation to be able to continue operating.
(Note, I was mainly writing server-side software where this was
especially important.)

You can't "fix" bugs in the caller code by writing to a log file.
Sometimes you can limit the damage, however.

If you can't trust the people writing the calling code, then that should
be the focus of your development process - find a way to be sure that
the caller code is right. That's where you want your conventions, or to
focus code reviews, training, automatic test systems - whatever is
appropriate for your team and project. Make sure callers pass correct
data to the function, and the function can do its job properly.

Sometimes it makes sense to specify functions differently, and accept a
wider input. Maybe instead of saying "this function will return the
integer square root of numbers between 0 and 10", you say "this function
will return the integer square root if given a number between 0 and 10,
and will log a message and return -1 for other int values". Fair enough
- now you've got a new function where it is very easy for the caller to
ensure the preconditions are satisfied. But be very aware of the costs
- you have now destroyed the "purity" of the function, and lost the key mathematical relation between the input and output. (You have also made everything much less efficient.)

In terms of development practices, for large code bases you should
divide things up into modules with clear boundaries. And then you might
say that the teams working on other modules that call yours are muppets
that can't read a function specification and can't get their code right.
So these boundary functions have to accept as wide a range of inputs
as possible, and check them as well as possible. But you only do that
for these externally accessible interfaces, not your internal code.

That's one reason why (as elsethread mentioned) I dislike 'else'
to handle a defined value; I prefer an explicit 'if' and use the
else for reporting unexpected situations (that practically never
appear, or, with the diagnostics QA-evaluated, asymptotically
disappearing).

(For pure binary predicates there's no errors branch, of course.)

Janis

PS: One of my favorite IT-gotchas is the plane crash where the
code specified landing procedure functions for height < 50.0 ft
and for height > 50.0 ft conditions, which mostly worked since
the height got polled only every couple seconds, and the case
height = 50.0 ft happened only very rarely due to the typical
descent characteristics during landing.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Sat Nov 9 23:21:47 2024

On 09/11/2024 03:57, Janis Papanagnou wrote:

On 08.11.2024 23:24, Bart wrote:

On 08/11/2024 17:37, Janis Papanagnou wrote:

BTW, the same symbols can also be used as an alternative form
of the 'case' statement; the semantic distinction is made by
context, e.g. the types involved in the construct.

You mean whether the 'a' in '(a | b... | c)' has type Bool rather than Int? >>
I've always discriminated on the number of terms between the two |s:
either 1, or more than 1.

I suppose in a [historic] "C" like language it's impossible to
distinguish on type here (given that there was no 'bool' type
[in former times] in "C"). - But I'm not quite sure whether
you're speaking here about your "C"-like language or some other
language you implemented.

I currently have three HLL implementations:

* For my C subset language (originally I had some enhancements, now
dropped)

* For my 'M' systems language inspired by A68 syntax

* For my 'Q' scripting language, with the same syntax, more or less

The remark was about those last two.

if cond then
s1
s2
else
s3
s4
end

(Looks a lot more like a scripting language without semicolons.)

This is what I've long suspected: that people associate clear, pseudo-code-like syntax with scripting languages.

'Serious' ones apparently need to look the business with a lot of extra punctuation. The more clutter the better!

By that criteria, C++ is obviously more advanced than C:

C: #include <stdio.h>
printf("A=%d B=%d\n", a, b);

C++ #include <iostream>
std::cout << "A=" << a << " " << "B=" << b << std::endl;

Maybe Zig even more so (normally you'd create a shorter alias to that
print):

Zig: @import("std").debug.print("A={d} B={d}\n", .{a, b});

By that measure, mine probably looks like a toy:

M: println =a, =b

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Sun Nov 10 03:27:13 2024

On 09/11/2024 05:51, Janis Papanagnou wrote:

On 08.11.2024 19:18, David Brown wrote:

On 08/11/2024 18:37, Janis Papanagnou wrote:

The language here is "mathematics". I would not expect anyone who even
considers designing a programming language to be unfamiliar with that
symbol.

Mathematics, unfortunately, [too] often has several symbols for
the same thing. (It's in that respect not very different from
programming languages, where you can [somewhat] rely on + - * /
but beyond that it's getting more tight.)

Programming languages have the additional problem that you don't
have all necessary symbols available, so language designers have
to map them onto existing symbols. (Also Unicode in modern times
do not solve that fact, since languages typically rely on ASCII,
or some 8-bit extension, at most; full Unicode support, I think,
is rare, especially on the lexical language level. Some allow
them in strings, some in identifiers; but in language keywords?)

Sure, I appreciate all this. We must do the best we can - I am simply
saying that using | for this operation is far from the best choice.

BTW, in Algol 68 you can define operators, so you can define
"OP V" or "OP ^" (for 'or' and 'and', respectively, but we cannot
define (e.g.) "OP �" (a middle dot, e.g. for multiplication).[*]

The detail of what symbols are used is
not that important to me, if it fits to the overall language
design.

I am quite happy with the same symbol being used for very different
meanings in different contexts. C's use of "*" for indirection and for
multiplication is rarely confusing. Using | for "bitwise or" and also
using it for a "pipe" operator would probably be fine - only one
operation makes sense for the types involved. But here the two
operations - "bitwise or" (or logical or) and "choice" can apply to to
the same types of operands. That's what makes it a very poor choice of
syntax.

Well, I'm more used (from mathematics) to 'v' and '^' than to '|'
and '&', respectively. But that doesn't prevent me from accepting
other symbols like '|' to have some [mathematical] meaning, or
even different meanings depending on context. In mathematics it's
not different; same symbols are used in different contexts with
different semantics. (And there's also the mentioned problem of
non-coherent literature WRT used mathematics' symbols.)

We are - unfortunately, perhaps - constrained by common keyboards and
ASCII (for the most part). "v" and "^" are poor choices for "or" and
"and" - "∨" and "∧" would be much nicer, but are hard to type. For
better or worse, the programming world has settled on "|" and "&" as
practical alternatives. ("+" and "." are often used in boolean logic,
and can be typed on normal keyboards, but would quickly be confused with
other uses of those symbols.)

(For comparison, Algol 68 uses "OR", "∨" or "\/" for the "or" operator,
thus it does not have this confusion.)

Actually, while I like Algol 68's flexibility, there's in some
cases (to my liking) too many variants. This had partly been
necessary, of course, due to the (even more) restricted character
sets (e.g. 6-bit characters) available in the 1960's.

The two options for conditionals I consider very useful, though,
and it also produces very legible and easily understandable code.

[...]

I've nothing (much) against the operation - it's the choice of operator
that is wrong.

Well, on opinions there's nothing more to discuss, I suppose.

Opinions can be justified, and that discussion can be interesting.
Purely subjective opinion is less interesting.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Sun Nov 10 15:01:44 2024

On 09.11.2024 11:08, fir wrote:

Bart wrote:

On 06/11/2024 07:26, Kaz Kylheku wrote:

On 2024-11-05, Bart <bc@freeuk.com> wrote:

[...] I extended the latter to N-way select:

x := (n | a, b, c, ... | z)

This looks quite error-prone. You have to count carefully that
the cases match the intended values. If an entry is
inserted, all the remaining ones shift to a higher value.

You've basically taken a case construct and auto-generated
the labels starting from 1.

It's a version of Algol68's case construct:

x := CASE n IN a, b, c OUT z ESAC

which also has the same compact form I use. I only use the compact
version because n is usually small, and it is intended to be used within
an expression: print (n | "One", "Two", "Three" | "Other").

[...]

An out-of-bounds index yields 'void' (via a '| void' part inserted by
the compiler). This is one of my examples from that era:

xt := (messa | 1,1,1, 2,2,2, 3,3,3)
yt := (messa | 3,2,1, 3,2,1, 3,2,1)

still the more c compatimle version would look better imo

xt = {1,1,1, 2,2,2, 3,3,3}[messa];
yt = {3,2,1, 3,2,1, 3,2,1}[messa];

[...]

It might look better - which of course lies in the eyes of the
beholder - but this would actually need more guaranteed context
or explicit tests (whether "messa" is within defined bounds) to
become a safe construct; which then again makes it more clumsy.

Above you also write about the syntax (which included the 'else'
case) that "This looks quite error-prone." and that you have to
"count carefully". Why do you think the "C-like" syntax is less
error prone and that you wouldn't have to count?

The biggest problem with such old switch semantics is, IMO, that
you have to map them on sequence numbers [1..N], or use them just
in contexts where you naturally have such selectors given. (Not
that the "C-like" suggestion would address that inherent issue.)

In "C" I occasionally used a {...}[...] or "..."[...] syntax,
but rather in this form: {...}[... % n] , where 'n' is the
determined (constant) number of elements.

Janis

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Sun Nov 10 15:22:21 2024

On 09.11.2024 12:06, David Brown wrote:

On 09/11/2024 07:54, Janis Papanagnou wrote:

Well, it's a software system design decision whether you want to
make the caller test the preconditions for every function call,
or let the callee take care of unexpected input, or both.

Well, I suppose it is their decision - they can do the right thing, or
the wrong thing, or both.

I believe I explained in previous posts why it is the /caller's/ responsibility to ensure pre-conditions are fulfilled, and why anything
else is simply guaranteeing extra overheads while giving you less
information for checking code correctness. But I realise that could
have been lost in the mass of posts, so I can go through it again if you want.

I haven't read all the posts, or rather, I just skipped most posts;
it's too time consuming.

Since you explicitly elaborated - thanks! - I will read this one...

[...]

(On security boundaries, system call interfaces, etc., where the caller
could be malicious or incompetent in a way that damages something other
than their own program, you have to treat all inputs as dangerous and sanitize them, just like data from external sources. That's a different matter, and not the real focus here.)

We had always followed the convention to avoid all undefined
situations and always define every 'else' case by some sensible
behavior, at least writing a notice into a log-file, but also
to "fix" the runtime situation to be able to continue operating.
(Note, I was mainly writing server-side software where this was
especially important.)

You can't "fix" bugs in the caller code by writing to a log file.
Sometimes you can limit the damage, however.

I spoke more generally of fixing situations (not only bugs).

If you can't trust the people writing the calling code, then that should
be the focus of your development process - find a way to be sure that
the caller code is right. That's where you want your conventions, or to focus code reviews, training, automatic test systems - whatever is appropriate for your team and project. Make sure callers pass correct
data to the function, and the function can do its job properly.

Yes.

Sometimes it makes sense to specify functions differently, and accept a
wider input. Maybe instead of saying "this function will return the
integer square root of numbers between 0 and 10", you say "this function
will return the integer square root if given a number between 0 and 10,
and will log a message and return -1 for other int values". Fair enough
- now you've got a new function where it is very easy for the caller to ensure the preconditions are satisfied. But be very aware of the costs
- you have now destroyed the "purity" of the function, and lost the key mathematical relation between the input and output. (You have also made everything much less efficient.)

I disagree in the "much less" generalization. I also think that when
weighing performance versus safety my preferences might be different;
I'm only speaking about a "rule of thumb", not about the actual (IMO) necessity(!) to make this decisions depending on the project context.

[...]

Janis

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Sun Nov 10 16:05:02 2024

On 09.11.2024 17:27, David Brown wrote:

On 09/11/2024 05:51, Janis Papanagnou wrote:

[...]

Sure, I appreciate all this. We must do the best we can - I am simply
saying that using | for this operation is far from the best choice.

That's also what I understood. - My point is that preferences (and
opinions) differ. (And I haven't seen any convincing rationale.)

Frankly, we're confronted with so much rubbish syntax (in various
languages, even in the ones we have to or even like to use) that
I'm at least astonished about your [strong appearing] opinion here.

Well, I'm more used (from mathematics) to 'v' and '^' than to '|'
and '&', respectively. But that doesn't prevent me from accepting
other symbols like '|' to have some [mathematical] meaning, or
even different meanings depending on context. In mathematics it's
not different; same symbols are used in different contexts with
different semantics. (And there's also the mentioned problem of
non-coherent literature WRT used mathematics' symbols.)

We are - unfortunately, perhaps - constrained by common keyboards and
ASCII (for the most part). "v" and "^" are poor choices for "or" and
"and" - "∨" and "∧" would be much nicer, but are hard to type.

That was the key what I wanted to express. (I used the approximated
symbols only for convenience.) - But, as a fact, the symbols I used
(an alpha-letter and a punctuation character) can [in Algol 68] be
effectively used as valid operators but the more appropriate Unicode
characters can't. (In the Genie compiler the 'v' must be used as 'V',
though.)

(Yes, it's a pity that we are constrained by keyboards, but not only
by that. And international use and cooperation makes sensible general applicable solutions not easier.)

For
better or worse, the programming world has settled on "|" and "&" as practical alternatives.

Only a subset of the languages; nowadays vastly those that took "C" -
to my very astonishment! - as a design paragon.

Personally I prefer 'and' and 'or' to '&&' and '||', or '&' and '|'.
(And the others, "∧" and "∨", are out for said reasons.)

The symbol '|' I associate more with alternatives (BNF, shell syntax,
etc.). But in Unix shell also with pipes (in former Unixes '^', BTW).
And I have no problem with it if used as a separator in a conditional,
where "separator" is of course not the formally appropriate term.

("+" and "." are often used in boolean logic,
and can be typed on normal keyboards, but would quickly be confused with other uses of those symbols.)

[...]

Well, on opinions there's nothing more to discuss, I suppose.

Opinions can be justified, and that discussion can be interesting.
Purely subjective opinion is less interesting.

Sure. Your's appreciated as well.

Janis

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Sun Nov 10 17:00:19 2024

Bart <bc@freeuk.com> wrote:

On 05/11/2024 19:53, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

On 05/11/2024 12:42, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

Then we disagree on what 'multi-way' select might mean. I think it means >>>>> branching, even if notionally, on one-of-N possible code paths.

OK.

The whole construct may or may not return a value. If it does, then one >>>>> of the N paths must be a default path.

You need to cover all input values. This is possible when there
is reasonably small number of possibilities. For example, switch on
char variable which covers all possible values does not need default
path. Default is needed only when number of possibilities is too
large to explicitely give all of them. And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.

What's easier to implement in a language: to have a conditional need for >>> an 'else' branch, which is dependent on the compiler performing some
arbitrarily complex levels of analysis on some arbitrarily complex set
of expressions...

...or to just always require 'else', with a dummy value if necessary?

Well, frequently it is easier to do bad job, than a good one.

I assume that you consider the simple solution the 'bad' one?

You wrote about _always_ requiring 'else' regardless if it is
needed or not. Yes, I consider this bad.

I'd would consider a much elaborate one putting the onus on external
tools, and still having an unpredictable result to be the poor of the two.

You want to create a language that is easily compilable, no matter how complex the input.

Normally time spent _using_ compiler should be bigger than time
spending writing compiler. If compiler gets enough use, it
justifies some complexity.

With the simple solution, the worst that can happen is that you have to write a dummy 'else' branch, perhaps with a dummy zero value.

If control never reaches that point, it will never be executed (at
worse, it may need to skip an instruction).

But if the compiler is clever enough (optionally clever, it is not a requirement!), then it could eliminate that code.

A bonus is that when debugging, you can comment out all or part of the previous lines, but the 'else' now catches those untested cases.

I am mainly concerned with clarity and correctness of source code.
Dummy 'else' doing something may hide errors. Dummy 'else' signaling
error means that something which could be compile time error is
only detected at runtime.

Compiler that detects most errors of this sort is IMO better than
compiler which makes no effort to detect them. And clearly, once
problem is formulated in sufficiently general way, it becomes
unsolvable. So I do not expect general solution, but expect
resonable effort.

normally you do not need very complex analysis:

I don't want to do any analysis at all! I just want a mechanical
translation as effortlessly as possible.

I don't like unbalanced code within a function because it's wrong and
can cause problems.

Well, I demand more from compiler than you do...

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Sun Nov 10 17:57:26 2024

David Brown <david.brown@hesbynett.no> wrote:

On 05/11/2024 20:39, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

On 05/11/2024 13:42, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

Then we disagree on what 'multi-way' select might mean. I think it means >>>>> branching, even if notionally, on one-of-N possible code paths.

OK.

I appreciate this is what Bart means by that phrase, but I don't agree
with it. I'm not sure if that is covered by "OK" or not!

You may prefer your own definition, but Bart's is resonable one.

The only argument I can make here is that I have not seen "multi-way
select" as a defined phrase with a particular established meaning.

There is well-defined concept appearing when studing control structures.
I am not sure if "multi-way select" is usual name for it, but with
Bart explanation it is very clear that he meant this concept. And
even without his explanation I would assume that he meant this concept.

The whole construct may or may not return a value. If it does, then one >>>>> of the N paths must be a default path.

You need to cover all input values. This is possible when there
is reasonably small number of possibilities. For example, switch on
char variable which covers all possible values does not need default
path. Default is needed only when number of possibilities is too
large to explicitely give all of them. And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.

I think this is all very dependent on what you mean by "all input values". >>>
Supposing I declare this function:

// Return the integer square root of numbers between 0 and 10
int small_int_sqrt(int x);

To me, the range of "all input values" is integers from 0 to 10. I
could implement it as :

int small_int_sqrt(int x) {
if (x == 0) return 0;
if (x < 4) return 1;
if (x < 9) return 2;
if (x < 16) return 3;
unreachable();
}

If the user asks for small_int_sqrt(-10) or small_int_sqrt(20), that's
/their/ fault and /their/ problem. I said nothing about what would
happen in those cases.

But some people seem to feel that "all input values" means every
possible value of the input types, and thus that a function like this
should return a value even when there is no correct value in and no
correct value out.

Well, some languages treat types more seriously than C. In Pascal
type of your input would be 0..10 and all input values would be
handled. Sure, when domain is too complicated to express in type
than it could be documented restriction. Still, it makes sense to
signal error if value goes outside handled rage, so in a sense all
values of input type are handled: either you get valid answer or
clear error.

No, it does not make sense to do that. Just because the C language does
not currently (maybe once C++ gets contracts, C will copy them) have a
way to specify input sets other than by types, does not mean that
functions in C always have a domain matching all possible combinations
of bits in the underlying representation of the parameter's types.

It might be a useful fault-finding aid temporarily to add error messages
for inputs that are invalid but can physically be squeezed into the parameters. That won't stop people making incorrect declarations of the function and passing completely different parameter types to it, or
finding other ways to break the requirements of the function.

And in general there is no way to check the validity of the inputs - you usually have no choice but to trust the caller. It's only in simple
cases, like the example above, that it would be feasible at all.

There are, of course, situations where the person calling the function
is likely to be incompetent, malicious, or both, and where there can be serious consequences for what you might prefer to consider as invalid
input values.

You apparently exclude possibility of competent persons making a
mistake. AFAIK industry statistic shows that code develeped by
good developers using rigorous process still contains substantial
number of bugs. So, it makes sense to have as much as possible
verified mechanically. Which in common practice means depending on
type checks. In less common practice you may have some theorem
proving framework checking assertions about input arguments,
then the assertions take role of types.

You have that for things like OS system calls - it's no
different than dealing with user inputs or data from external sources.
But you handle that by extending the function - increase the range of
valid inputs and appropriate outputs. You no longer have a function
that takes a number between 0 and 10 and returns the integer square root
- you now have a function that takes a number between -(2 ^ 31 + 1) and
(2 ^ 31) and returns the integer square root if the input is in the
range 0 to 10 or halts the program with an error message for other
inputs in the wider range. It's a different function, with a wider set
of inputs - and again, it is specified to give particular results for particular inputs.

It make sense to extend definition when such extention converts
function which use can be verified only by informal process into
one with formally verified use.

I certainly would
be quite unhappy with code above. It is possible that I would still
use it as a compromise (say if it was desirable to have single
prototype but handle points in spaces of various dimensions),
but my first attempt would be something like:

typedef struct {int p[2];} two_int;
....

I think you'd quickly find that limiting and awkward in C (but it might
be appropriate in other languages).

Your snippet handled only two element arrays. If that is right assumption
for the problem, then typedef above expresses it in IMO resonable
way. Yes, it is more characters to write than usual C idioms.
My main "trouble" is that usually I want to handle variable sized
arrays. In such case beside pointer there would be size argument.
I would probably use variably modified type in such case.

But don't misunderstand me - I am
all in favour of finding ways in code that make input requirements
clearer or enforceable within the language - never put anything in
comments if you can do it in code. You could reasonably do this in C
for the first example :

// Do not use this directly
extern int small_int_sqrt_implementation(int x);

// Return the integer square root of numbers between 0 and 10
static inline int small_int_sqrt(int x) {
assert(x >= 0 && x <= 10);
return small_int_sqrt_implementation(x);
}

Hmm, why extern implementation and static wrapper? I would do
the opposite.

A function should accept all input values - once you have made clear
what the acceptable input values can be. A "default" case is just a
short-cut for conveniently handling a wide range of valid input values - >>> it is never a tool for handling /invalid/ input values.

Well, default can signal error which frequently is right handling
of invalid input values.

Will that somehow fix the bug in the code that calls the function?

It can be a useful debugging and testing aid, certainly, but it does not make the code "correct" or "safe" in any sense.

There is concept of "partial correctness": code if it finishes returns
correct value. A variation of this is: code if it finishes without
signaling error returns correct values. Such condition may be
much easier to verify than "full correctness" and in many case
is almost as useful. In particular, mathematicians are _very_
unhappy when program return incorrect results. But they are used
to programs which can not deliver results, either because of
lack or resources or because needed case was not implemented.

When dealing with math formulas there are frequently various
restrictions on parameters, like we can only divide by nonzero
quantity. By signaling error when restrictions are not
satisfied we ensure that sucessful completition means that
restrictions were satisfied. Of course that alone does not
mean that result is correct, but correctness of "general"
case is usually _much_ easier to ensure. In other words,
failing restrictions are major source of errors, and signaling
errors effectively eliminates it.

In world of prefect programmers, they would check restrictions
before calling any function depending on them, or prove that
restrictions on arguments to a function imply correctness of
calls made by the function. But world is imperfect and in
real world extra runtime checks are quite useful.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Sun Nov 10 18:16:22 2024

On 09.11.2024 13:21, Bart wrote:

On 09/11/2024 03:57, Janis Papanagnou wrote:

[...] - But I'm not quite sure whether
you're speaking here about your "C"-like language or some other
language you implemented.

I currently have three HLL implementations:

* For my C subset language (originally I had some enhancements, now
dropped)

* For my 'M' systems language inspired by A68 syntax

* For my 'Q' scripting language, with the same syntax, more or less

The remark was about those last two.

if cond then
s1
s2
else
s3
s4
end

(Looks a lot more like a scripting language without semicolons.)

This is what I've long suspected: that people associate clear, pseudo-code-like syntax with scripting languages.

Most posts from you that I saw were addressing your "C"-like
language, so I was confused about the actual focus of your post.

It's helpful to give some hint if posted code is intended as
pseudo-code. That wasn't clear to me. So thanks for clarifying.

BTW, I don't consider scripting languages as "bad" - I'm actually
doing quite a lot scripting. - My comment doesn't contain any
valuation and also didn't intend to insinuate one.

Janis

[...]

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Mon Nov 11 02:13:21 2024

On 10/11/2024 05:22, Janis Papanagnou wrote:

On 09.11.2024 12:06, David Brown wrote:

On 09/11/2024 07:54, Janis Papanagnou wrote:

Well, it's a software system design decision whether you want to
make the caller test the preconditions for every function call,
or let the callee take care of unexpected input, or both.

Well, I suppose it is their decision - they can do the right thing, or
the wrong thing, or both.

I believe I explained in previous posts why it is the /caller's/
responsibility to ensure pre-conditions are fulfilled, and why anything
else is simply guaranteeing extra overheads while giving you less
information for checking code correctness. But I realise that could
have been lost in the mass of posts, so I can go through it again if you
want.

I haven't read all the posts, or rather, I just skipped most posts;
it's too time consuming.

I should probably have skipped /writing/ the posts - it was too time
consuming :-)

Since you explicitly elaborated - thanks! - I will read this one...

[...]

(On security boundaries, system call interfaces, etc., where the caller
could be malicious or incompetent in a way that damages something other
than their own program, you have to treat all inputs as dangerous and
sanitize them, just like data from external sources. That's a different
matter, and not the real focus here.)

We had always followed the convention to avoid all undefined
situations and always define every 'else' case by some sensible
behavior, at least writing a notice into a log-file, but also
to "fix" the runtime situation to be able to continue operating.
(Note, I was mainly writing server-side software where this was
especially important.)

You can't "fix" bugs in the caller code by writing to a log file.
Sometimes you can limit the damage, however.

I spoke more generally of fixing situations (not only bugs).

OK. It can certainly help with /finding/ bugs, that can then be fixed
later.

If you can't trust the people writing the calling code, then that should
be the focus of your development process - find a way to be sure that
the caller code is right. That's where you want your conventions, or to
focus code reviews, training, automatic test systems - whatever is
appropriate for your team and project. Make sure callers pass correct
data to the function, and the function can do its job properly.

Yes.

Sometimes it makes sense to specify functions differently, and accept a
wider input. Maybe instead of saying "this function will return the
integer square root of numbers between 0 and 10", you say "this function
will return the integer square root if given a number between 0 and 10,
and will log a message and return -1 for other int values". Fair enough
- now you've got a new function where it is very easy for the caller to
ensure the preconditions are satisfied. But be very aware of the costs
- you have now destroyed the "purity" of the function, and lost the key
mathematical relation between the input and output. (You have also made
everything much less efficient.)

I disagree in the "much less" generalization. I also think that when
weighing performance versus safety my preferences might be different;
I'm only speaking about a "rule of thumb", not about the actual (IMO) necessity(!) to make this decisions depending on the project context.

My preferences are very much weighted towards correctness, not
efficiency. That includes /knowing/ that things are correct, not just
passing some tests. And key to that is knowing facts about the code
that can be used to reason about it. If you have a function that has
clear and specific pre-conditions, you know what you have to do in order
to use it correctly. It can then give clear and specific
post-conditions, and you can use these to reason further about your
code. On the other hand, if the function can, in practice, take any
input then you have learned little. And if it can do all sorts of
different things - log a message, return an arbitrary "default" value,
etc., - then you have nothing to work with for proving or verifying the
rest of your code.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Mon Nov 11 02:38:25 2024

On 10/11/2024 07:57, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

On 05/11/2024 20:39, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

On 05/11/2024 13:42, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

It might be a useful fault-finding aid temporarily to add error messages
for inputs that are invalid but can physically be squeezed into the
parameters. That won't stop people making incorrect declarations of the
function and passing completely different parameter types to it, or
finding other ways to break the requirements of the function.

And in general there is no way to check the validity of the inputs - you
usually have no choice but to trust the caller. It's only in simple
cases, like the example above, that it would be feasible at all.

There are, of course, situations where the person calling the function
is likely to be incompetent, malicious, or both, and where there can be
serious consequences for what you might prefer to consider as invalid
input values.

You apparently exclude possibility of competent persons making a
mistake.

I didn't do so intentionally. I wasn't trying to be exhaustive here. I
have several times mentioned that extra checks can be very helpful in fault-finding and debugging - good programmers also make mistakes and
need to debug their code.

AFAIK industry statistic shows that code develeped by
good developers using rigorous process still contains substantial
number of bugs. So, it makes sense to have as much as possible
verified mechanically. Which in common practice means depending on
type checks. In less common practice you may have some theorem
proving framework checking assertions about input arguments,
then the assertions take role of types.

Type checks can be extremely helpful, and strong typing greatly reduces
the errors in released code by catching them early (at compile time).
And temporary run-time checks are also helpful during development or debugging.

But extra run-time checks are costly (and I don't mean just in run-time performance, which is only an issue in a minority of situations). They
mean more code - which means more scope for errors, and more code that
must be checked and maintained. Usually this code can't be tested well
in final products - precisely because it is there to handle a situation
that never occurs.

But don't misunderstand me - I am
all in favour of finding ways in code that make input requirements
clearer or enforceable within the language - never put anything in
comments if you can do it in code. You could reasonably do this in C
for the first example :

// Do not use this directly
extern int small_int_sqrt_implementation(int x);

// Return the integer square root of numbers between 0 and 10
static inline int small_int_sqrt(int x) {
assert(x >= 0 && x <= 10);
return small_int_sqrt_implementation(x);
}

Hmm, why extern implementation and static wrapper? I would do
the opposite.

I wrote it the way you might have it in a header - the run-time check disappears when it is disabled (or if the compiler can see that the
check always passes). The real function implementation is hidden away
in an implementation module.

A function should accept all input values - once you have made clear
what the acceptable input values can be. A "default" case is just a
short-cut for conveniently handling a wide range of valid input values - >>>> it is never a tool for handling /invalid/ input values.

Well, default can signal error which frequently is right handling
of invalid input values.

Will that somehow fix the bug in the code that calls the function?

It can be a useful debugging and testing aid, certainly, but it does not
make the code "correct" or "safe" in any sense.

There is concept of "partial correctness": code if it finishes returns correct value. A variation of this is: code if it finishes without
signaling error returns correct values. Such condition may be
much easier to verify than "full correctness" and in many case
is almost as useful. In particular, mathematicians are _very_
unhappy when program return incorrect results. But they are used
to programs which can not deliver results, either because of
lack or resources or because needed case was not implemented.

When dealing with math formulas there are frequently various
restrictions on parameters, like we can only divide by nonzero
quantity. By signaling error when restrictions are not
satisfied we ensure that sucessful completition means that
restrictions were satisfied. Of course that alone does not
mean that result is correct, but correctness of "general"
case is usually _much_ easier to ensure. In other words,
failing restrictions are major source of errors, and signaling
errors effectively eliminates it.

Yes, out-of-band signalling in some way is a useful way to indicate a
problem, and can allow parameter checking without losing the useful
results of a function. This is the principle behind exceptions in many languages - then functions either return normally with correct results,
or you have a clearly abnormal situation.

In world of prefect programmers, they would check restrictions
before calling any function depending on them, or prove that
restrictions on arguments to a function imply correctness of
calls made by the function. But world is imperfect and in
real world extra runtime checks are quite useful.

Runtime checks in a function can be useful if you know the calling code
might not be perfect and the function is going to take responsibility
for identifying that situation. Programmers will often be writing both
the caller and callee code, and put temporary debugging and test checks wherever it is most convenient.

But I think being too enthusiastic about putting checks in the wrong
place - the callee function - can hide the real problems, or make the
callee code writer less careful about getting their part of the code
correct.

Real-world programmers are imperfect - that does not mean their code has
to be.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Tue Nov 12 06:09:08 2024

David Brown <david.brown@hesbynett.no> wrote:

On 10/11/2024 07:57, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

On 05/11/2024 20:39, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

On 05/11/2024 13:42, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

Type checks can be extremely helpful, and strong typing greatly reduces
the errors in released code by catching them early (at compile time).
And temporary run-time checks are also helpful during development or debugging.

But extra run-time checks are costly (and I don't mean just in run-time performance, which is only an issue in a minority of situations). They
mean more code - which means more scope for errors, and more code that
must be checked and maintained. Usually this code can't be tested well
in final products - precisely because it is there to handle a situation
that never occurs.

It depends. gcc used to have several accessors macros which could
perform checks. They were turned of during "production use" (mainly
because checks increased runtime), but were "always" present in
source code. "Source cost" was moderate, checking code took hundreds,
moje be low thousends of lines in headres definitng the macros.
Actual use of macros was the same as if macros did no checking,
so there was minimal increase in source complexity.

Concerning testing, things exposed in exported interface frequently
can be tested with reasonable effort. The main issue is generating
apropriate arguments and possibly replicating global state (but
I normally have global state only when strictly necessary).

A function should accept all input values - once you have made clear >>>>> what the acceptable input values can be. A "default" case is just a >>>>> short-cut for conveniently handling a wide range of valid input values - >>>>> it is never a tool for handling /invalid/ input values.

Well, default can signal error which frequently is right handling
of invalid input values.

Will that somehow fix the bug in the code that calls the function?

It can be a useful debugging and testing aid, certainly, but it does not >>> make the code "correct" or "safe" in any sense.

There is concept of "partial correctness": code if it finishes returns
correct value. A variation of this is: code if it finishes without
signaling error returns correct values. Such condition may be
much easier to verify than "full correctness" and in many case
is almost as useful. In particular, mathematicians are _very_
unhappy when program return incorrect results. But they are used
to programs which can not deliver results, either because of
lack or resources or because needed case was not implemented.

When dealing with math formulas there are frequently various
restrictions on parameters, like we can only divide by nonzero
quantity. By signaling error when restrictions are not
satisfied we ensure that sucessful completition means that
restrictions were satisfied. Of course that alone does not
mean that result is correct, but correctness of "general"
case is usually _much_ easier to ensure. In other words,
failing restrictions are major source of errors, and signaling
errors effectively eliminates it.

Yes, out-of-band signalling in some way is a useful way to indicate a problem, and can allow parameter checking without losing the useful
results of a function. This is the principle behind exceptions in many languages - then functions either return normally with correct results,
or you have a clearly abnormal situation.

In world of prefect programmers, they would check restrictions
before calling any function depending on them, or prove that
restrictions on arguments to a function imply correctness of
calls made by the function. But world is imperfect and in
real world extra runtime checks are quite useful.

Runtime checks in a function can be useful if you know the calling code might not be perfect and the function is going to take responsibility
for identifying that situation. Programmers will often be writing both
the caller and callee code, and put temporary debugging and test checks wherever it is most convenient.

But I think being too enthusiastic about putting checks in the wrong
place - the callee function - can hide the real problems, or make the
callee code writer less careful about getting their part of the code correct.

IME the opposite: not having checks in called function simply delays
moment when error is detected. Getting errors early helps focus on
tricky problems or misconceptions. And motivates programmers to
be more careful

Concerning correct place for checks: one could argue that check
should be close to place where the result of check matters, which
frequently is in called function. And frequently check requires
computation that is done by called function as part of normal
processing, but would be extra code in the caller.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Tue Nov 12 08:24:02 2024

On 10/11/2024 06:00, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

I assume that you consider the simple solution the 'bad' one?

You wrote about _always_ requiring 'else' regardless if it is
needed or not. Yes, I consider this bad.

It is 'needed' by the language because of its rules. It might not be
needed by a particular function because the author knows that all
expected values of the 2**64 range of most scalar parameters have been covered.

The language doesn't know.

But the rule only applies to value-returning statements; you can choose
not to use such statements, but more conventional ones like those in C.

However, the language will still consider the last statement of a value-returning function to be such a statement. So either that one
needs 'else' (perhaps in multiple branches), or you instead need a dummy 'return x' at the end of the function, one which is never executed.

I don't think that's too onerous, and it is safer than somehow asking
the language to disable the requirement. (How would that be done, by
some special keyword? Then you'd just be writing that keyword instead of 'return'!)

I'd would consider a much elaborate one putting the onus on external
tools, and still having an unpredictable result to be the poor of the two. >>
You want to create a language that is easily compilable, no matter how
complex the input.

Normally time spent _using_ compiler should be bigger than time
spending writing compiler. If compiler gets enough use, it
justifies some complexity.

That doesn't add up: the more the compiler gets used, the slower it
should get?!

The sort of analysis you're implying I don't think belongs in the kind
of compiler I prefer. Even if it did, it would be later on in the
process than the point where the above restriction is checked, so
wouldn't exist in one of my compilers anyway.

I don't like open-ended tasks like this where compilation time could end
up being anything. If you need to keep recompiling the same module, then
you don't want to repeat that work each time.

I am mainly concerned with clarity and correctness of source code.

So am I. I try to keep my syntax clean and uncluttered.

Dummy 'else' doing something may hide errors.

So can 'unreachable'.

Dummy 'else' signaling
error means that something which could be compile time error is
only detected at runtime.

Compiler that detects most errors of this sort is IMO better than
compiler which makes no effort to detect them. And clearly, once
problem is formulated in sufficiently general way, it becomes
unsolvable. So I do not expect general solution, but expect
resonable effort.

So how would David Brown's example work:

int F(int n) {
if (n==1) return 10;
if (n==2) return 20;
}

/You/ know that values -2**31 to 0 and 3 to 2**31-1 are impossible; the compiler doesn't. It's likely to tell you that you may run into the end
of the function.

So what do you want the compiler to here? If I try it:

func F(int n)int =
if n=1 then return 10 fi
if n=2 then return 20 fi
end

It says 'else needed' (in that last statement). I can also shut it up
like this:

func F(int n)int = # int is i64 here
if n=1 then return 10 fi
if n=2 then return 20 fi
0
end

Since now that last statement is the '0' value (any int value wil do).
What should my compiler report instead? What analysis should it be
doing? What would that save me from typing?

normally you do not need very complex analysis:

I don't want to do any analysis at all! I just want a mechanical
translation as effortlessly as possible.

I don't like unbalanced code within a function because it's wrong and
can cause problems.

Well, I demand more from compiler than you do...

Perhaps you're happy for it to be bigger and slower too. Most of my
projects build more or less instantly. Here 'ms' is a version that runs programs directly from source (the first 'ms' is 'ms.exe' and subsequent
ones are 'ms.m' the lead module):

c:\bx>ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms hello
Hello World! 21:00:45

This builds and runs 15 successive generations of itself in memory
before building and running hello.m; it took 1 second in all. (Now try
that with gcc!)

Here:

c:\cx>tm \bx\mm -runp cc sql
Compiling cc.m to <pcl>
Compiling sql.c to sql.exe

This compiles my C compiler from source but then it /interprets/ the IR produced. This interpreted compiler took 6 seconds to build the 250Kloc
test file, and it's a very slow interpreter (it's used for testing and debugging).

(gcc -O0 took a bit longer to build sql.c! About 7 seconds but it is
using a heftier windows.h.)

If I run the C compiler from source as native code (\bx\ms cc sql) then building the compiler *and* sql.c takes 1/3 of a second.

You can't do this stuff with the compilers David Brown uses; I'm
guessing you can't do it with your prefered ones either.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Tue Nov 12 20:43:54 2024

On 11/11/2024 20:09, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

Runtime checks in a function can be useful if you know the calling code
might not be perfect and the function is going to take responsibility
for identifying that situation. Programmers will often be writing both
the caller and callee code, and put temporary debugging and test checks
wherever it is most convenient.

But I think being too enthusiastic about putting checks in the wrong
place - the callee function - can hide the real problems, or make the
callee code writer less careful about getting their part of the code
correct.

IME the opposite: not having checks in called function simply delays
moment when error is detected. Getting errors early helps focus on
tricky problems or misconceptions. And motivates programmers to
be more careful

I am always in favour of finding errors at the earliest opportunity -
suitable compiler (and even editor/IDE) warnings, strong types, static assertions, etc., are vital tools. Having temporary extra checks at appropriate points in the code are often useful for debugging.

I don't share your feeling about what motivates programmers to be more
careful - however, I have no evidence to back that up.

Concerning correct place for checks: one could argue that check
should be close to place where the result of check matters, which
frequently is in called function.

No, there I disagree. The correct place for the checks should be close
to where the error is, and that is in the /calling/ code. If the called function is correctly written, reviewed, tested, documented and
considered "finished", why would it be appropriate to add extra code to
that in order to test and debug some completely different part of the code?

The place where the result of the check /really/ matters, is the calling
code. And that is also the place where you can most easily find the
error, since the error is in the calling code, not the called function.
And it is most likely to be the code that you are working on at the time
- the called function is already written and tested.

And frequently check requires
computation that is done by called function as part of normal
processing, but would be extra code in the caller.

It is more likely to be the opposite in practice.

And for much of the time, the called function has no real practical way
to check the parameters anyway. A function that takes a pointer
parameter - not an uncommon situation - generally has no way to check
the validity of the pointer. It can't check that the pointer actually
points to useful source data or an appropriate place to store data.

All it can do is check for a null pointer, which is usually a fairly
useless thing to do (unless the specifications for the function make the pointer optional). After all, on most (but not all) systems you already
have a "free" null pointer check - if the caller code has screwed up and passed a null pointer when it should not have done, the program will
quickly crash when the pointer is used for access. Many compilers
provide a way to annotate function declarations to say that a pointer
must not be null, and can then spot at least some such errors at compile
time. And of course the calling code will very often be passing the
address of an object in the call - since that can't be null, a check in
the function is pointless.

Once you get to more complex data structures, the possibility for the
caller to check the parameters gets steadily less realistic.

So now your practice of having functions "always" check their parameters leaves the people writing calling code with a false sense of security - usually you /don't/ check the parameters, you only ever do simple checks
that that called could (and should!) do if they were realistic. You've
got the maintenance and cognitive overload of extra source code for your various "asserts" and other check, regardless of any run-time costs
(which are often irrelevant, but occasionally very important).

You will note that much of this - for both sides of the argument - uses
words like "often", "generally" or "frequently". It is important to appreciate that programming spans a very wide range of situations, and I
don't want to be too categorical about things. I have already said
there are situations when parameter checking in called functions can
make sense. I've no doubt that for some people and some types of
coding, such cases are a lot more common than what I see in my coding.

Note also that when you can use tools to automate checks, such as
"sanitize" options in compilers or different languages that have more
in-built checks, the balance differs. You will generally pay a run-time
cost for those checks, but you don't have the same kind of source-level
costs - your code is still clean, clear, and amenable to correctness
checking, without hiding the functionality of the code in a mass of unnecessary explicit checks. This is particularly good for debugging,
and the run-time costs might not be important. (But if run-time costs
are not important, there's a good chance that C is not the best language
to be using in the first place.)

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Sat Nov 16 05:50:43 2024

David Brown <david.brown@hesbynett.no> wrote:

On 11/11/2024 20:09, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

Concerning correct place for checks: one could argue that check
should be close to place where the result of check matters, which
frequently is in called function.

No, there I disagree. The correct place for the checks should be close
to where the error is, and that is in the /calling/ code. If the called function is correctly written, reviewed, tested, documented and
considered "finished", why would it be appropriate to add extra code to
that in order to test and debug some completely different part of the code?

The place where the result of the check /really/ matters, is the calling code. And that is also the place where you can most easily find the
error, since the error is in the calling code, not the called function.
And it is most likely to be the code that you are working on at the time
- the called function is already written and tested.

And frequently check requires
computation that is done by called function as part of normal
processing, but would be extra code in the caller.

It is more likely to be the opposite in practice.

And for much of the time, the called function has no real practical way
to check the parameters anyway. A function that takes a pointer
parameter - not an uncommon situation - generally has no way to check
the validity of the pointer. It can't check that the pointer actually points to useful source data or an appropriate place to store data.

All it can do is check for a null pointer, which is usually a fairly
useless thing to do (unless the specifications for the function make the pointer optional). After all, on most (but not all) systems you already have a "free" null pointer check - if the caller code has screwed up and passed a null pointer when it should not have done, the program will
quickly crash when the pointer is used for access. Many compilers
provide a way to annotate function declarations to say that a pointer
must not be null, and can then spot at least some such errors at compile time. And of course the calling code will very often be passing the
address of an object in the call - since that can't be null, a check in
the function is pointless.

Well, in a sense pointers are easy: if you do not play nasty tricks
with casts then type checks do significant part of checking. Of
course, pointer may be uninitialized (but compiler warnings help a lot
here), memory may be overwritten, etc. But overwritten memory is
rather special, if you checked that content of memory is correct,
but it is overwritten after the check, then earlier check does not
help. Anyway, main point is ensuring that pointed to data satisfies
expected conditions.

Once you get to more complex data structures, the possibility for the
caller to check the parameters gets steadily less realistic.

So now your practice of having functions "always" check their parameters leaves the people writing calling code with a false sense of security - usually you /don't/ check the parameters, you only ever do simple checks that that called could (and should!) do if they were realistic. You've
got the maintenance and cognitive overload of extra source code for your various "asserts" and other check, regardless of any run-time costs
(which are often irrelevant, but occasionally very important).

You will note that much of this - for both sides of the argument - uses words like "often", "generally" or "frequently". It is important to appreciate that programming spans a very wide range of situations, and I don't want to be too categorical about things. I have already said
there are situations when parameter checking in called functions can
make sense. I've no doubt that for some people and some types of
coding, such cases are a lot more common than what I see in my coding.

Note also that when you can use tools to automate checks, such as
"sanitize" options in compilers or different languages that have more in-built checks, the balance differs. You will generally pay a run-time cost for those checks, but you don't have the same kind of source-level costs - your code is still clean, clear, and amenable to correctness checking, without hiding the functionality of the code in a mass of unnecessary explicit checks. This is particularly good for debugging,
and the run-time costs might not be important. (But if run-time costs
are not important, there's a good chance that C is not the best language
to be using in the first place.)

Our experience differs. As a silly example consider a parser
which produces parse tree. Caller is supposed to pass syntactically
correct string as an argument. However, checking syntactic corretnetness requires almost the same effort as producing parse tree, so it
ususal that parser both checks correctness and produces the result.
I have computations that are quite different than parsing but
in some cases share the same characteristic: checking correctness of
arguments requires complex computation similar to producing
actual result. More freqently, called routine can check various
invariants which with high probablity can detect errors. Doing
the same check in caller is inpractical.

Most of my coding is in different languages than C. One of languages
that I use essentially forces programmer to insert checks in
some places. For example unions are tagged and one can use
specific variant only after checking that this is the current
variant. Similarly, fall-trough control structures may lead
to type error at compile time. But signalling error is considered
type safe. So code which checks for unhandled case and signals
errors is accepted as type correct. Unhandled cases frequently
lead to type errors. There is some overhead, but IMO it is accepable.
The language in question is garbage collected, so many memory
related problems go away.

Frequently checks come as natural byproduct of computations. When
handling tree like structures in C IME usualy simplest code code
is reqursive with base case being the null pointer. When base
case should not occur we get check instead of computation.
Skipping such checks also put cognitive load on the reader:
normal pattern has corresponding case, so reader does not know
if the case was ommited by accident or it can not occur. Comment
may clarify this, but error check is equally clear.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Stefan Ram@3:633/280.2 to All on Sat Nov 16 20:42:49 2024

Dan Purgert <dan@djph.net> wrote or quoted:

if (n==0) { printf ("n: %u\n",n); n++;}
if (n==1) { printf ("n: %u\n",n); n++;}
if (n==2) { printf ("n: %u\n",n); n++;}
if (n==3) { printf ("n: %u\n",n); n++;}
if (n==4) { printf ("n: %u\n",n); n++;}
printf ("all if completed, n=%u\n",n);

My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!

In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".

Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.

#include <stdio.h>

const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }

void print_english( int const n )
{ printf( "%s\n", english( n )); }

int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: Stefan Ram (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Sun Nov 17 01:51:34 2024

On 16/11/2024 09:42, Stefan Ram wrote:

Dan Purgert <dan@djph.net> wrote or quoted:

if (n==0) { printf ("n: %u\n",n); n++;}
if (n==1) { printf ("n: %u\n",n); n++;}
if (n==2) { printf ("n: %u\n",n); n++;}
if (n==3) { printf ("n: %u\n",n); n++;}
if (n==4) { printf ("n: %u\n",n); n++;}
printf ("all if completed, n=%u\n",n);

My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!

In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".

FGS please turn down the 'hip lingo' generator down a few notches!

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From James Kuyper@3:633/280.2 to All on Sun Nov 17 02:14:07 2024

On 11/16/24 04:42, Stefan Ram wrote:
....

My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!

In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".

Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.

#include <stdio.h>

const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }

void print_english( int const n )
{ printf( "%s\n", english( n )); }

int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }

Nice. It did take a little while for me to figure out what was wrong,
but since I knew that something was wrong, I did eventually find it -
without first running the program.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Lew Pitcher@3:633/280.2 to All on Sun Nov 17 02:37:24 2024

On Sat, 16 Nov 2024 09:42:49 +0000, Stefan Ram wrote:

Dan Purgert <dan@djph.net> wrote or quoted:

if (n==0) { printf ("n: %u\n",n); n++;}
if (n==1) { printf ("n: %u\n",n); n++;}
if (n==2) { printf ("n: %u\n",n); n++;}
if (n==3) { printf ("n: %u\n",n); n++;}
if (n==4) { printf ("n: %u\n",n); n++;}
printf ("all if completed, n=%u\n",n);

My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!

In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".

Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.

#include <stdio.h>

const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }

void print_english( int const n )
{ printf( "%s\n", english( n )); }

int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }

If I read your code correctly, you have actually included not one,
but TWO curveballs. Well done!

--
Lew Pitcher
"In Skills We Trust"

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Sun Nov 17 03:29:17 2024

On 15/11/2024 19:50, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

On 11/11/2024 20:09, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

Concerning correct place for checks: one could argue that check
should be close to place where the result of check matters, which
frequently is in called function.

No, there I disagree. The correct place for the checks should be close
to where the error is, and that is in the /calling/ code. If the called
function is correctly written, reviewed, tested, documented and
considered "finished", why would it be appropriate to add extra code to
that in order to test and debug some completely different part of the code? >>
The place where the result of the check /really/ matters, is the calling
code. And that is also the place where you can most easily find the
error, since the error is in the calling code, not the called function.
And it is most likely to be the code that you are working on at the time
- the called function is already written and tested.

And frequently check requires
computation that is done by called function as part of normal
processing, but would be extra code in the caller.

It is more likely to be the opposite in practice.

And for much of the time, the called function has no real practical way
to check the parameters anyway. A function that takes a pointer
parameter - not an uncommon situation - generally has no way to check
the validity of the pointer. It can't check that the pointer actually
points to useful source data or an appropriate place to store data.

All it can do is check for a null pointer, which is usually a fairly
useless thing to do (unless the specifications for the function make the
pointer optional). After all, on most (but not all) systems you already
have a "free" null pointer check - if the caller code has screwed up and
passed a null pointer when it should not have done, the program will
quickly crash when the pointer is used for access. Many compilers
provide a way to annotate function declarations to say that a pointer
must not be null, and can then spot at least some such errors at compile
time. And of course the calling code will very often be passing the
address of an object in the call - since that can't be null, a check in
the function is pointless.

Well, in a sense pointers are easy: if you do not play nasty tricks
with casts then type checks do significant part of checking. Of
course, pointer may be uninitialized (but compiler warnings help a lot
here), memory may be overwritten, etc. But overwritten memory is
rather special, if you checked that content of memory is correct,
but it is overwritten after the check, then earlier check does not
help. Anyway, main point is ensuring that pointed to data satisfies
expected conditions.

That does not match reality. Pointers are far and away the biggest
source of errors in C code. Use after free, buffer overflows, mixups of
who "owns" the pointer - the scope for errors is boundless. You are
correct that type systems can catch many potential types of errors - unfortunately, people /do/ play nasty tricks with type checks.
Conversions of pointer types are found all over the place in C
programming, especially conversions back and forth with void* pointers.

All this means that invalid pointer parameters are very much a real
issue - but are typically impossible to check in the called function.

The way you avoid getting errors in your pointers is being careful about having the right data in the first place, so you only call functions
with valid parameters. You do this by having careful control about the ownership and lifetime of pointers, and what they point to, keeping conventions in the names of your pointers and functions to indicate who
owns what, and so on. And you use sanitizers and similar tools during
testing and debugging to distinguish between tests that worked by luck,
and ones that worked reliably. (And of course you may consider other languages than C that help you express your requirements in a clearer
manner or with better automatic checking.)

Put the same effort and due diligence into the rest of your code, and
suddenly you find your checks for other kinds of parameters in functions
are irrelevant as you are now making sure you call functions with
appropriate valid inputs.

Once you get to more complex data structures, the possibility for the
caller to check the parameters gets steadily less realistic.

So now your practice of having functions "always" check their parameters
leaves the people writing calling code with a false sense of security -
usually you /don't/ check the parameters, you only ever do simple checks
that that called could (and should!) do if they were realistic. You've
got the maintenance and cognitive overload of extra source code for your
various "asserts" and other check, regardless of any run-time costs
(which are often irrelevant, but occasionally very important).

You will note that much of this - for both sides of the argument - uses
words like "often", "generally" or "frequently". It is important to
appreciate that programming spans a very wide range of situations, and I
don't want to be too categorical about things. I have already said
there are situations when parameter checking in called functions can
make sense. I've no doubt that for some people and some types of
coding, such cases are a lot more common than what I see in my coding.

Note also that when you can use tools to automate checks, such as
"sanitize" options in compilers or different languages that have more
in-built checks, the balance differs. You will generally pay a run-time
cost for those checks, but you don't have the same kind of source-level
costs - your code is still clean, clear, and amenable to correctness
checking, without hiding the functionality of the code in a mass of
unnecessary explicit checks. This is particularly good for debugging,
and the run-time costs might not be important. (But if run-time costs
are not important, there's a good chance that C is not the best language
to be using in the first place.)

Our experience differs. As a silly example consider a parser
which produces parse tree. Caller is supposed to pass syntactically
correct string as an argument. However, checking syntactic corretnetness requires almost the same effort as producing parse tree, so it
ususal that parser both checks correctness and produces the result.

The trick here is to avoid producing a syntactically invalid string in
the first place. Solve the issue at the point where there is a mistake
in the code!

(If you are talking about a string that comes from outside the code in
some way, then of course you need to check it - and if that is most conveniently done during the rest of parsing, then that is fair enough.)

I have computations that are quite different than parsing but
in some cases share the same characteristic: checking correctness of arguments requires complex computation similar to producing
actual result. More freqently, called routine can check various
invariants which with high probablity can detect errors. Doing
the same check in caller is inpractical.

I think you are misunderstanding me - maybe I have been unclear. I am
saying that it is the /caller's/ responsibility to make sure that the parameters it passes are correct, not the /callee's/ responsibility.
That does not mean that the caller has to add checks to get the
parameters right - it means the caller has to use correct parameters.

Think of this like walking near a cliff-edge. Checking parameters
before the call is like having a barrier at the edge of the cliff. My recommendation is that you know where the cliff edge is, and don't walk
there. Checking parameters in the called function is like having a
crash mat at the bottom of the cliff for people who blindly walk off it.

Most of my coding is in different languages than C. One of languages
that I use essentially forces programmer to insert checks in
some places. For example unions are tagged and one can use
specific variant only after checking that this is the current
variant. Similarly, fall-trough control structures may lead
to type error at compile time. But signalling error is considered
type safe. So code which checks for unhandled case and signals
errors is accepted as type correct. Unhandled cases frequently
lead to type errors. There is some overhead, but IMO it is accepable.
The language in question is garbage collected, so many memory
related problems go away.

Frequently checks come as natural byproduct of computations. When
handling tree like structures in C IME usualy simplest code code
is reqursive with base case being the null pointer. When base
case should not occur we get check instead of computation.
Skipping such checks also put cognitive load on the reader:
normal pattern has corresponding case, so reader does not know
if the case was ommited by accident or it can not occur. Comment
may clarify this, but error check is equally clear.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Sun Nov 17 03:38:37 2024

On 16/11/2024 15:51, Bart wrote:

On 16/11/2024 09:42, Stefan Ram wrote:

Dan Purgert <dan@djph.net> wrote or quoted:

if (n==0) { printf ("n: %u\n",n); n++;}
if (n==1) { printf ("n: %u\n",n); n++;}
if (n==2) { printf ("n: %u\n",n); n++;}
if (n==3) { printf ("n: %u\n",n); n++;}
if (n==4) { printf ("n: %u\n",n); n++;}
printf ("all if completed, n=%u\n",n);

�� My bad if the following instruction structure's already been hashed
�� out in this thread, but I haven't been following the whole convo!

�� In my C 101 classes, after we've covered "if" and "else",
�� I always throw this program up on the screen and hit the newbies
�� with this curveball: "What's this bad boy going to spit out?".

FGS please turn down the 'hip lingo' generator down a few notches!

I wonder what happened to Stefan. He used to make perfectly good posts.
Then he disappeared for a bit, and came back with this new "style".

Given that this "new" Stefan can write posts with interesting C content,
such as this one, and has retained his ugly coding layout and
non-standard Usenet format, I have to assume it's still the same person
behind the posts.

Is he using some "translate to hip lingo" tool? Or has he had a stroke
or brain tumour that has rendered him incapable of writing text like an
adult while still being able to write C code?

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Tim Rentsch@3:633/280.2 to All on Mon Nov 18 00:51:26 2024

Lew Pitcher <lew.pitcher@digitalfreehold.ca> writes:

On Sat, 16 Nov 2024 09:42:49 +0000, Stefan Ram wrote:

Dan Purgert <dan@djph.net> wrote or quoted:

if (n==0) { printf ("n: %u\n",n); n++;}
if (n==1) { printf ("n: %u\n",n); n++;}
if (n==2) { printf ("n: %u\n",n); n++;}
if (n==3) { printf ("n: %u\n",n); n++;}
if (n==4) { printf ("n: %u\n",n); n++;}
printf ("all if completed, n=%u\n",n);

My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!

In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".

Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.

#include <stdio.h>

const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }

void print_english( int const n )
{ printf( "%s\n", english( n )); }

int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }

If I read your code correctly, you have actually included not one,
but TWO curveballs. Well done!

What's the second curveball?

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Tue Nov 19 12:53:05 2024

Bart <bc@freeuk.com> wrote:

On 10/11/2024 06:00, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

I'd would consider a much elaborate one putting the onus on external
tools, and still having an unpredictable result to be the poor of the two. >>>
You want to create a language that is easily compilable, no matter how
complex the input.

Normally time spent _using_ compiler should be bigger than time
spending writing compiler. If compiler gets enough use, it
justifies some complexity.

That doesn't add up: the more the compiler gets used, the slower it
should get?!

More complicated does not mean slower. Binary search or hash tables
are more complicated than linear search, but for larger data may
be much faster. Similarly, compiler may be simplified by using
simpler but slower methods and more complicated compiler may use
faster methods. This is particularly relevant here: simple compiler
may keep list of cases or ranges and linarly scan those. More
advanced one may use a say a tree structure.

More generaly, I want to minimize time spent by the programmer,
that is _sum over all iterations leading to correct program_ of
compile time and "think time". Compiler that compiles slower,
but allows less iterations due to better diagnostics may win.
Also, humans perceive 0.1s delay almost like no delay at all.
So it does not matter if single compilation step is 0.1s or
0.1ms. Modern computers can do a lot of work in 0.1s.

The sort of analysis you're implying I don't think belongs in the kind
of compiler I prefer. Even if it did, it would be later on in the
process than the point where the above restriction is checked, so
wouldn't exist in one of my compilers anyway.

Sure, you design your compiler as you like.

I don't like open-ended tasks like this where compilation time could end
up being anything. If you need to keep recompiling the same module, then
you don't want to repeat that work each time.

Yes. This may lead to some complexity. Simple approach is to
avoid obviously useless recompilation ('make' is doing this).
More complicated approach may keep some intermediate data and
try to "validate" them first. If previous analysis is valid,
then it can be reused. If something significant changes, than
it needs to be re-done. But many changes only have very local
effect, so at least theoretically re-using analyses could
save substantial time.

Concerning open-ended, may attitude is that compiler should make
effort which is open-ended in the sense that when new method is
discovered then compiler may be extended and do more work.
OTOH in "single true compiler" world, compiler may say "this is
too difficult, giving up". Of course, when trying something
very hard compiler is likely to run out of memory or user will
stop it. But compiler may give up earlier. Of course, this
is unacceptable for a standarized language, when people move
programs between different compiler. If compiler can legally
reject a program because of its limitation and is doing this
with significant probabliity, than portablity between compilers
is severly limited. But if there is a way to disable extra
checks, then this may work. This is one of reasones why
'gcc' has so many options: users that want it can get stronger
checking, but if they want 'gcc' will accept lousy code
too.

I am mainly concerned with clarity and correctness of source code.

So am I. I try to keep my syntax clean and uncluttered.

Dummy 'else' doing something may hide errors.

So can 'unreachable'.

Dummy 'else' signaling
error means that something which could be compile time error is
only detected at runtime.

Compiler that detects most errors of this sort is IMO better than
compiler which makes no effort to detect them. And clearly, once
problem is formulated in sufficiently general way, it becomes
unsolvable. So I do not expect general solution, but expect
resonable effort.

So how would David Brown's example work:

int F(int n) {
if (n==1) return 10;
if (n==2) return 20;
}

/You/ know that values -2**31 to 0 and 3 to 2**31-1 are impossible; the compiler doesn't. It's likely to tell you that you may run into the end
of the function.

So what do you want the compiler to here? If I try it:

func F(int n)int =
if n=1 then return 10 fi
if n=2 then return 20 fi
end

It says 'else needed' (in that last statement). I can also shut it up
like this:

func F(int n)int = # int is i64 here
if n=1 then return 10 fi
if n=2 then return 20 fi
0
end

Since now that last statement is the '0' value (any int value wil do).
What should my compiler report instead? What analysis should it be
doing? What would that save me from typing?

Currently in typed language that I use literal translation of
the example hits a hole in checks, that is the code is accepted.

Concerning needed analyses: one thing needed is representation of
type, either Pascal range type or enumeration type (the example
is _very_ unatural because in modern programming magic numbers
are avoided and there would be some symbolic representation
adding meaning to the numbers). Second, compiler must recognize
that this is a "multiway switch" and collect conditions. Once
you have such representation (which may be desirable for other
reasons) it is easy to determine set of handled values. More
precisely, in this example we just have small number of discrete
values. More ambitious compiler may have list of ranges.
If type also specifies list of values or list of ranges, then
it is easy to check if all values of the type are handled.

normally you do not need very complex analysis:

I don't want to do any analysis at all! I just want a mechanical
translation as effortlessly as possible.

I don't like unbalanced code within a function because it's wrong and
can cause problems.

Well, I demand more from compiler than you do...

Perhaps you're happy for it to be bigger and slower too. Most of my
projects build more or less instantly. Here 'ms' is a version that runs programs directly from source (the first 'ms' is 'ms.exe' and subsequent ones are 'ms.m' the lead module):

c:\bx>ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms hello
Hello World! 21:00:45

This builds and runs 15 successive generations of itself in memory
before building and running hello.m; it took 1 second in all. (Now try
that with gcc!)

Here:

c:\cx>tm \bx\mm -runp cc sql
Compiling cc.m to <pcl>
Compiling sql.c to sql.exe

This compiles my C compiler from source but then it /interprets/ the IR produced. This interpreted compiler took 6 seconds to build the 250Kloc
test file, and it's a very slow interpreter (it's used for testing and debugging).

(gcc -O0 took a bit longer to build sql.c! About 7 seconds but it is
using a heftier windows.h.)

If I run the C compiler from source as native code (\bx\ms cc sql) then building the compiler *and* sql.c takes 1/3 of a second.

You can't do this stuff with the compilers David Brown uses; I'm
guessing you can't do it with your prefered ones either.

To recompile the typed system I use (about 0.4M lines) on new fast
machine I need about 53s. But that is kind of cheating:
- this time is for parallel build using 20 logical cores
- the compiler is not in the language it compiles (but in untyped
vesion of it)
- actuall compilation of the compiler is small part of total
compile time
On slow machine compile time can be as large as 40 minutes.

An untyped system that I use has about 0.5M lines and recompiles
itself in 16s on the same machine. This one uses single core.
On slow machine compile time may be closer to 2 minutes.
Again, compiler compile time is only a part of build time.
Actualy, one time-intensive part is creating index for included
documentation. Another is C compilation for a library file
(system has image-processing functions and low-level part of
image processing is done in C). Recomplation starts from
minimal version of the system, rebuilding this minimal
version takes 3.3s.

Note that in both cases line counts are from 'wc'. Both systems
contain substantial amount of documentation, I tried to compensate
for this, but size measured in terms of LOC (that is excluding
comments, empty lines, non-code files) would be significantly
smaller.

Anyway, I do not need cascaded recompilation than you present.
Both system above have incermental compilation, the second one
at statement/function level: it offers interactive prompt
which takes a statement from the user, compiles it and immediately
executes. Such statement may define a function or perform compilation.
Even on _very_ slow machine there is no noticable delay due to
compilation, unless you feed the system with some oversized statement
or function (presumably from a file).

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Tue Nov 19 16:37:31 2024

On 10.11.2024 16:13, David Brown wrote:

[...]

My preferences are very much weighted towards correctness, not
efficiency. That includes /knowing/ that things are correct, not just passing some tests. [...]

I agree with you. But given what you write I'm also sure you know
what's achievable in theory, what's an avid wish, and what's really
possible. Yet there's also projects that don't seem to care, where
speedy delivery is the primary goal. Guaranteeing formal correctness
had never been an issue in the industry contexts I worked in, and I
was always glad when I had a good test environment, with a good test
coverage, and continuous refinement of tests. Informal documentation,
factual checks of the arguments, and actual tests was what kept the
quality of our project deliveries at a high level.

Janis

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Tue Nov 19 17:25:27 2024

On 16.11.2024 17:38, David Brown wrote:

I wonder what happened to Stefan. He used to make perfectly good posts.
Then he disappeared for a bit, and came back with this new "style".

Given that this "new" Stefan can write posts with interesting C content,
such as this one, and has retained his ugly coding layout and
non-standard Usenet format, I have to assume it's still the same person behind the posts.

Sorry that I cannot resist asking what you consider "non-standard
Usenet format", given that your posts don't consider line length.
(Did the "standards" change during the past three decades maybe?
Do we use only those parts of the "standards" that we like and
ignore others? Or does it boil down to Netiquette is no standard?)

Janis, just curious and no offense intended :-)

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Tue Nov 19 19:19:18 2024

On 19/11/2024 06:37, Janis Papanagnou wrote:

On 10.11.2024 16:13, David Brown wrote:

[...]

My preferences are very much weighted towards correctness, not
efficiency. That includes /knowing/ that things are correct, not just
passing some tests. [...]

I agree with you. But given what you write I'm also sure you know
what's achievable in theory, what's an avid wish, and what's really
possible.

Sure. I've done my fair share of "write-test-debug" cycling for writing
code - that's almost inevitable when interacting with something else
(hardware devices, other programs, users, etc.) that are poorly
specified. At the other end of the scale, you have things such as race conditions, where is no option but to make sure the code is written
correctly.

The original context of this discussion was about small self-contained functions, where correctness is very much achievable in practice - /if/
you understand that it is something worth aiming at.

Yet there's also projects that don't seem to care, where
speedy delivery is the primary goal. Guaranteeing formal correctness
had never been an issue in the industry contexts I worked in, and I
was always glad when I had a good test environment, with a good test coverage, and continuous refinement of tests. Informal documentation,
factual checks of the arguments, and actual tests was what kept the
quality of our project deliveries at a high level.

There are a great variety of projects, and the development style differs wildly. Ultimately, you want a cost-benefit balance that makes sense
for what you are doing, and true formal proof methods are only
cost-effective in very niche circumstances. In my work, I have rarely
used any kind of formal methods - but I constantly have the principles
in mind. When I call a function, I can see that the parameters I use
are valid - and /could/ be proven valid. I know what the outputs of the function are, and how they fit in with the calling code - and I use that
to know the validity of the next function called. If I can't see such
things, it's time to re-factor the code to improve clarity.

Of course testing is important, at many levels. But the time to test
your code is when you are confident that it is correct - testing is not
an alternative to writing code that is as clearly correct as you are
able to make it.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Tue Nov 19 19:30:19 2024

On 19/11/2024 07:25, Janis Papanagnou wrote:

On 16.11.2024 17:38, David Brown wrote:

I wonder what happened to Stefan. He used to make perfectly good posts.
Then he disappeared for a bit, and came back with this new "style".

Given that this "new" Stefan can write posts with interesting C content,
such as this one, and has retained his ugly coding layout and
non-standard Usenet format, I have to assume it's still the same person
behind the posts.

Sorry that I cannot resist asking what you consider "non-standard
Usenet format", given that your posts don't consider line length.
(Did the "standards" change during the past three decades maybe?
Do we use only those parts of the "standards" that we like and
ignore others? Or does it boil down to Netiquette is no standard?)

Janis, just curious and no offense intended :-)

I hadn't even considered taking offence! And if you are right that my
line length is wrong, I am glad to be told.

AFAIK, my posts /do/ follow line length standards. You are using
Thunderbird like me, I believe - select one of my posts and use ctrl-U
to see the source, and the lines are split appropriately. But depending
on the details of posts and clients, and the way lines are split
(manually or automatically), lines are not always displayed with a 72 character width.

Stefan's posting format has extra indentation for his prose, but
additional quoted material (such as code) is outdented. Perhaps that
does not count as "non-standard Usenet format", but it is certainly a formatting style that is highly unusual and characteristic.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Michael S@3:633/280.2 to All on Tue Nov 19 22:21:51 2024

On Tue, 19 Nov 2024 07:25:27 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 16.11.2024 17:38, David Brown wrote:

I wonder what happened to Stefan. He used to make perfectly good
posts. Then he disappeared for a bit, and came back with this new
"style".

Given that this "new" Stefan can write posts with interesting C
content, such as this one, and has retained his ugly coding layout
and non-standard Usenet format, I have to assume it's still the
same person behind the posts.

Sorry that I cannot resist asking what you consider "non-standard
Usenet format", given that your posts don't consider line length.
(Did the "standards" change during the past three decades maybe?
Do we use only those parts of the "standards" that we like and
ignore others? Or does it boil down to Netiquette is no standard?)

It's not that 'X-No-Archive: Yes' and 'Archive: no' headers used by
Stefan Ram are not standard. They are just very unusual. He also has 'X-No-Archive-Readme' header that indicates that he expects that Usenet
servers will interpret his headers in a way that no real world
automatic server software would do. It looks like he expects individual treatment by human being.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Wed Nov 20 00:29:06 2024

On 19.11.2024 09:19, David Brown wrote:
[...]

There are a great variety of projects, [...]

I don't want the theme to get out of hand, so just one amendment to...

Of course testing is important, at many levels. But the time to test
your code is when you are confident that it is correct - testing is not
an alternative to writing code that is as clearly correct as you are
able to make it.

Sound like early days practice, where code is written, "defined" at
some point as "correct", and then tests written (sometimes written
by the same folks who implemented the code) to prove that the code
is doing the expected, or the tests have been spared because it was
"clear" that the code is "correct" (sort of).

Since the 1990's we've had other principles, yes, "on many levels"
(as you started your paragraph). At all levels there's some sort of specification (or description) that defined the expected outcome
and behavior; tests [of levels higher than unit-tests] are written
if not in parallel then usually by separate groups. The decoupling
is important, the "first implement, then test" serializing certainly
not.

Of course every responsible programmer tries to create correct code,
supported by own experience and by projects' regulatory means. But
that doesn't guarantee correct code. Neither do test guarantee that.
But tests have been, IME, more effective in supporting correctness
than being "confident that it is correct" (as you say).

Janis

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Wed Nov 20 00:41:51 2024

On 16.11.2024 16:14, James Kuyper wrote:

On 11/16/24 04:42, Stefan Ram wrote:
...

[...]

#include <stdio.h>

const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }

That's indeed a nice example. Where you get fooled by treachery
"trustiness" of formatting[*]. - In syntax we trust! [**]

void print_english( int const n )
{ printf( "%s\n", english( n )); }

int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }

Nice. It did take a little while for me to figure out what was wrong,
but since I knew that something was wrong, I did eventually find it -
without first running the program.

Same here. :-)

Janis

[*] Why do I have to think of Python now? - Never mind. Better
let sleeping dogs lie.

[**] As far as I am concerned.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Wed Nov 20 02:51:33 2024

On 19/11/2024 01:53, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

On 10/11/2024 06:00, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

I'd would consider a much elaborate one putting the onus on external
tools, and still having an unpredictable result to be the poor of the two. >>>>
You want to create a language that is easily compilable, no matter how >>>> complex the input.

Normally time spent _using_ compiler should be bigger than time
spending writing compiler. If compiler gets enough use, it
justifies some complexity.

That doesn't add up: the more the compiler gets used, the slower it
should get?!

More complicated does not mean slower. Binary search or hash tables
are more complicated than linear search, but for larger data may
be much faster.

That's not the complexity I had in mind. The 100-200MB sizes of
LLVM-based compilers are not because they use hash-tables over linear
search.

More generaly, I want to minimize time spent by the programmer,
that is _sum over all iterations leading to correct program_ of
compile time and "think time". Compiler that compiles slower,
but allows less iterations due to better diagnostics may win.
Also, humans perceive 0.1s delay almost like no delay at all.
So it does not matter if single compilation step is 0.1s or
0.1ms. Modern computers can do a lot of work in 0.1s.

What's the context of this 0.1 seconds? Do you consider it long or short?

My tools can generally build my apps from scratch in 0.1 seconds; big compilers tend to take a lot longer. Only Tiny C is in that ballpark.

So I'm failing to see your point here. Maybe you picked up that 0.1
seconds from an earlier post of mine and are suggesting I ought to be
able to do a lot more analysis within that time?

Yes. This may lead to some complexity. Simple approach is to
avoid obviously useless recompilation ('make' is doing this).
More complicated approach may keep some intermediate data and
try to "validate" them first. If previous analysis is valid,
then it can be reused. If something significant changes, than
it needs to be re-done. But many changes only have very local
effect, so at least theoretically re-using analyses could
save substantial time.

I consider compilation: turning textual source code into a form that can
be run, typically binary native code, to be a completely routine task
that should be as simple and as quick as flicking a light switch.

While anything else that might be a deep analysis of that program I
consider to be a quite different task. I'm not saying there is no place
for it, but I don't agree it should be integrated into every compiler
and always invoked.

Since now that last statement is the '0' value (any int value wil do).
What should my compiler report instead? What analysis should it be
doing? What would that save me from typing?

Currently in typed language that I use literal translation of
the example hits a hole in checks, that is the code is accepted.

Concerning needed analyses: one thing needed is representation of
type, either Pascal range type or enumeration type (the example
is _very_ unatural because in modern programming magic numbers
are avoided and there would be some symbolic representation
adding meaning to the numbers). Second, compiler must recognize
that this is a "multiway switch" and collect conditions.

The example came from C. Even if written as a switch, C switches do not
return values (and also are hard to even analyse as to which branch is
which).

In my languages, switches can return values, and a switch written as the
last statement of a function is considered to do so, even if each branch
uses an explicit 'return'. Then, it will consider a missing ELSE a 'hole'.

It will not do any analysis of the range other than what is necessary to implement switch (duplicate values, span of values, range-checking when
using jump tables).

So the language may require you to supply a dummy 'else x' or 'return
x'; so what?

The alternative appears to be one of:

* Instead of 'else' or 'return', to write 'unreachable', which puts some
trust, not in the programmer, but some person calling your function
who does not have sight of the source code, to avoid calling it with
invalid arguments

* Or relying on the variable capabilities of a compiler 'A', which might
sometimes be able to determine that some point is not reached, but
sometimes it can't. But when you use compiler 'B', it might have a
different result.

I'll stick with my scheme, thanks!

Once
you have such representation (which may be desirable for other
reasons) it is easy to determine set of handled values. More
precisely, in this example we just have small number of discrete
values. More ambitious compiler may have list of ranges.
If type also specifies list of values or list of ranges, then
it is easy to check if all values of the type are handled.

The types are tyically plain integers, with ranges from 2**8 to 2**64.
The ranges associated with application needs will be more arbitrary.

If talking about a language with ranged integer types, then there might
be more point to it, but that is itself a can of worms. (It's hard to do without getting halfway to implementing Ada.)

You can't do this stuff with the compilers David Brown uses; I'm
guessing you can't do it with your prefered ones either.

To recompile the typed system I use (about 0.4M lines) on new fast
machine I need about 53s. But that is kind of cheating:
- this time is for parallel build using 20 logical cores
- the compiler is not in the language it compiles (but in untyped
vesion of it)
- actuall compilation of the compiler is small part of total
compile time
On slow machine compile time can be as large as 40 minutes.

40 minutes for 400K lines? That's 160 lines per second; how old is this machine? Is the compiler written in Python?

An untyped system that I use has about 0.5M lines and recompiles
itself in 16s on the same machine. This one uses single core.
On slow machine compile time may be closer to 2 minutes.

So 4K to 30Klps.

Again, compiler compile time is only a part of build time.
Actualy, one time-intensive part is creating index for included documentation.

Which is not going to be part of a routine build.

Another is C compilation for a library file
(system has image-processing functions and low-level part of
image processing is done in C). Recomplation starts from
minimal version of the system, rebuilding this minimal
version takes 3.3s.

My language tools work on a whole program, where a 'program' is a single
EXE or DLL file (or a single OBJ file in some cases).

A 'build' then turns N source files into 1 binary file. This is the task
I am talking about.

A complete application may have several such binaries and a bunch of
other stuff. Maybe some source code is generated by a script. This part
is open-ended.

However each of my current projects is a single, self-contained binary
by design.

Anyway, I do not need cascaded recompilation than you present.
Both system above have incermental compilation, the second one
at statement/function level: it offers interactive prompt
which takes a statement from the user, compiles it and immediately
executes. Such statement may define a function or perform compilation.
Even on _very_ slow machine there is no noticable delay due to
compilation, unless you feed the system with some oversized statement
or function (presumably from a file).

This sounds like a REPL system. There, each line is a new part of the
program which is processed, executed and discarded. In that regard, it
is not really what I am talking about, which is AOT compilation of a
program represented by a bunch of source files.

Or can a new line redefine something, perhaps a function definition, previously entered amongst the last 100,000 lines? Can a new line
require compilation of something typed 50,000 lines ago?

What happens if you change the type of a global; are you saying that
none of the program codes needs revising?

What I do relies purely on raw compilation speed. No tricks are needed.
No incrementatal compilation is needed (the 'granularity' is a
'program': a single EXE/DLL file, as mentioned above).

You can change any single part, either local or global, and the file
thing is recompiled in an instant.

However, a 0.5M line project may take a second (unoptimised compiler),
but it would also generate a 5MB executable, which is quite sizeable.

Optimising my compiler and choosing to run the interpreter might reduce
that to half a second (to get to where the app starts to executed). That
could be done now. Other optimisations could be done while to reduce it further, but ATM they are not needed.

The only real example I have is an SQLite3 test, a 250Kloc C program
(but which which has lots of comments and conditional code; preprocessed
it's 85Kloc).

My C compiler can run that from source. It takes 0.22 seconds to compile 250Kloc/8MB of source to in-memory native code. Or I can run from source
via an interpreter, then it takes 1/6th of a second to get from C source
to IL code:

c:\cx>cc -runp sql
Compiling sql.c to 'pcl' # PCL is the name of my IL
Compile to PCL takes: 157 ms
SQLite version 3.25.3 2018-11-05 20:37:38
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .quit

Another example, building 40Kloc interpreter from source then running it
in memory:

c:\qx>tm \bx\mm -run qq hello
Compiling qq.m to memory
Hello, World! 19-Nov-2024 15:38:47
TM: 0.11

c:\qx>tm qq hello
Hello, World! 19-Nov-2024 15:38:49
TM: 0.05

The second version runs a precompiled EXE. So building from source added
only 90ms. Or I can use the interpreter like (so interpreting an
interpreter) to get an 0.08 second timing.

No tricks are needed. The only thing that might be a cheat here is using
OS file-caching. But nearly always, you will be building source files
that have either just been edited, or will have been compiled a few
seconds before.

An untyped system

What do you mean by an untyped system? To me it usually means
dynamically typed.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Scott Lurndal@3:633/280.2 to All on Wed Nov 20 03:11:51 2024

Reply-To: slp53@pacbell.net

Bart <bc@freeuk.com> writes:

On 19/11/2024 01:53, Waldek Hebisch wrote:

More complicated does not mean slower. Binary search or hash tables
are more complicated than linear search, but for larger data may
be much faster.

That's not the complexity I had in mind. The 100-200MB sizes of
LLVM-based compilers are not because they use hash-tables over linear >search.

You still have this irrational obsession with the amount of disk
space consumed by a compiler suite - one that is useful to a massive
number of developers (esp. compared with the user-base of your
compiler).

The amount of disk space consumed by a compilation suite is
a meaningless statistic. 10MByte disks are a relic of the
distant past.

My tools can generally build my apps from scratch in 0.1 seconds; big >compilers tend to take a lot longer. Only Tiny C is in that ballpark.

And Tiny C is useless for the majority of real-world applications.

How many people are using your compiler to build production applications?

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Wed Nov 20 03:43:00 2024

On 19/11/2024 16:11, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 19/11/2024 01:53, Waldek Hebisch wrote:

More complicated does not mean slower. Binary search or hash tables
are more complicated than linear search, but for larger data may
be much faster.

That's not the complexity I had in mind. The 100-200MB sizes of
LLVM-based compilers are not because they use hash-tables over linear
search.

You still have this irrational obsession with the amount of disk
space consumed by a compiler suite - one that is useful to a massive
number of developers (esp. compared with the user-base of your
compiler).

The amount of disk space consumed by a compilation suite is
a meaningless statistic. 10MByte disks are a relic of the
distant past.

Yes is. But what is NOT meaningless is everything else that goes with
it: vast complexity, and slow compile times, and that's just for the
apps you build with the tool. Building LLVM itself can be challenging.

My tools can generally build my apps from scratch in 0.1 seconds; big
compilers tend to take a lot longer. Only Tiny C is in that ballpark.

And Tiny C is useless for the majority of real-world applications.

How many people are using your compiler to build production applications?

It doesn't matter. It's enough to illustrate that routine compilation
CAN be done at up to 100 times faster than those big tools and with a
program that could fit on a floppy. Presumably at a significant power
saving as well, as that seems to be a big thing these days.

If a simple implementation has trouble with big applications, then that
would need to be looked at.

But I suspect the trouble doesn't lie within the small compiler.
Probably those big compilers have had to be endlessly tweaked over
decades to deal myriad small problems, perhaps bugs and corner cases
within the C language, or need to compile legacy code that is too
fragile to fix, all sorts of stuff.

Or, where the compilers were not specially modded, then codebases would
have headers with conditional blocks that special-case particular
compilers with tweaks to get around the idiosyncrasies of each.

Or, the apps depend on C extensions implemented only by a big compiler.

The end result is that when some upstart comes along with a new,
streamlined compiler, it will not be able build that codebase.

But, try creating a NEW real-world application that is primarily
developed and tested with Tiny C, then you will see two revelations:

* It *will* build with Tiny C with no problems, unsurprisingly

* It will also build with any of your big compilers because the code is necessarily conservative.

Congratulations, you now have a much healthier codebase that works cross-compiler without all those #ifdef blocks.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Wed Nov 20 04:31:06 2024

On 19/11/2024 14:29, Janis Papanagnou wrote:

On 19.11.2024 09:19, David Brown wrote:
[...]

There are a great variety of projects, [...]

I don't want the theme to get out of hand, so just one amendment to...

Of course testing is important, at many levels. But the time to test
your code is when you are confident that it is correct - testing is not
an alternative to writing code that is as clearly correct as you are
able to make it.

Sound like early days practice, where code is written, "defined" at
some point as "correct", and then tests written (sometimes written
by the same folks who implemented the code) to prove that the code
is doing the expected, or the tests have been spared because it was
"clear" that the code is "correct" (sort of).

Since the 1990's we've had other principles, yes, "on many levels"
(as you started your paragraph). At all levels there's some sort of specification (or description) that defined the expected outcome
and behavior; tests [of levels higher than unit-tests] are written
if not in parallel then usually by separate groups. The decoupling
is important, the "first implement, then test" serializing certainly
not.

Of course every responsible programmer tries to create correct code, supported by own experience and by projects' regulatory means. But
that doesn't guarantee correct code. Neither do test guarantee that.
But tests have been, IME, more effective in supporting correctness
than being "confident that it is correct" (as you say).

Both activities are about reducing the risk of incorrect code getting
through. In some cases, one of them is more practical or more effective
than the other, while in other situations you want to combine them.

My argument has never been against testing, nor have I claimed that programmers can be trusted to write infallible code!

All I have been arguing against is the idea of blindly putting in
validity tests for parameters in functions, as though it were a habit
that by itself leads to fewer bugs in code.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Wed Nov 20 06:11:03 2024

On 19/11/2024 15:51, Bart wrote:

On 19/11/2024 01:53, Waldek Hebisch wrote:

Another example, building 40Kloc interpreter from source then running it
in memory:

� c:\qx>tm \bx\mm -run qq hello
� Compiling qq.m to memory
� Hello, World! 19-Nov-2024 15:38:47
� TM: 0.11

� c:\qx>tm qq hello
� Hello, World! 19-Nov-2024 15:38:49
� TM: 0.05

The second version runs a precompiled EXE. So building from source added only 90ms.

Sorry, that should be 60ms. Running that interpreter from source only
takes 1/16th of a second longer not 1/11th of a second.

BTW I didn't remark on the range of your (WH's) figures. They spanned 40 minutes for a build to instant, but it's not clear for which languages
they are, which tools are used and which machines. Or how much work they
have to do to get those faster times, or what work they don't do: I'm guessing it's not processing 0.5M lines for that fastest time.

So it was hard to formulate a response.

All my timings are either for C or my systems language, running on one
core on the same PC.

For something that you can compare on your own machines, this is a test
using a one-file version of Lua adapted from https://github.com/edubart/minilua.

Timings and EXE sizes are:

Seconds KB

gcc -O0 -s 3.4 372
gcc -Os -s 8.5 241
gcc -O2 -s 11.7 328
gcc -O3 -s 14.4 378
tcc 0.9.27 0.14 384
cc 0.16 315 (My new C compiler)
cc 0.09 - (Compile to intepretable IL)
cc 0.11 - (Compile to IL then runnable in-mem code)
mcc 0.28 355 (My old C compiler uses intermediate ASM)

Since this is one file (of some tens of 1000s of KB; -E output varies),
any mod involves recompiling the whole thing.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Mark Bourne@3:633/280.2 to All on Wed Nov 20 07:51:47 2024

Bart wrote:

On 10/11/2024 06:00, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

I'd would consider a much elaborate one putting the onus on external
tools, and still having an unpredictable result to be the poor of the
two.

You want to create a language that is easily compilable, no matter how
complex the input.

Normally time spent _using_ compiler should be bigger than time
spending writing compiler.� If compiler gets enough use, it
justifies some complexity.

That doesn't add up: the more the compiler gets used, the slower it
should get?!

I may have misunderstood, but I don't think Waldek's comment was a claim
about how long a single compilation should take / how slow the compiler
should be made to be. I think it was a statement about the total amount
of time all users of a compiler can be expected to spend using it in comparison to the time compiler developers spend writing it.

If a compiler is used by a significant number of people, the total
amount of time users spend using it is far larger than the total amount
of time developers spend writing it, regardless of how long a single compilation takes. So overall it's worth the compiler developers
putting in extra effort to make the compiler more useful, provide better diagnostics, etc. rather than just doing whatever's easiest for them.
That may only save each user a relatively small amount of time, but
aggregated over all users of the compiler it adds up to a lot of time saved.

When a compiler is used by only a small number of people (or even just
one), it's not worth the compiler developer putting a lot of effort into
it, when it's only going to save a small number of people a small amount
of time.

--
Mark.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Wed Nov 20 09:40:45 2024

Bart <bc@freeuk.com> wrote:

On 19/11/2024 01:53, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

On 10/11/2024 06:00, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

I'd would consider a much elaborate one putting the onus on external >>>>> tools, and still having an unpredictable result to be the poor of the two.

You want to create a language that is easily compilable, no matter how >>>>> complex the input.

Normally time spent _using_ compiler should be bigger than time
spending writing compiler. If compiler gets enough use, it
justifies some complexity.

That doesn't add up: the more the compiler gets used, the slower it
should get?!

More complicated does not mean slower. Binary search or hash tables
are more complicated than linear search, but for larger data may
be much faster.

That's not the complexity I had in mind. The 100-200MB sizes of
LLVM-based compilers are not because they use hash-tables over linear search.

It is related: both gcc anf LLVM are doing analyses that in the
past were deemed inpracticaly expensive (both in time and in space).
Those analyses work now thanks to smart algorithms that
significantly reduced resource usage. I know that you consider
this too expensive. But the point is that there are also things
which are easy to program and are slow, but acceptable for some
people. You can speed up such things adding complexity to the
compiler.

More generaly, I want to minimize time spent by the programmer,
that is _sum over all iterations leading to correct program_ of
compile time and "think time". Compiler that compiles slower,
but allows less iterations due to better diagnostics may win.
Also, humans perceive 0.1s delay almost like no delay at all.
So it does not matter if single compilation step is 0.1s or
0.1ms. Modern computers can do a lot of work in 0.1s.

What's the context of this 0.1 seconds? Do you consider it long or short?

Context is interactive response. It means "pretty fast for interactive
use".

My tools can generally build my apps from scratch in 0.1 seconds; big compilers tend to take a lot longer. Only Tiny C is in that ballpark.

So I'm failing to see your point here. Maybe you picked up that 0.1
seconds from an earlier post of mine and are suggesting I ought to be
able to do a lot more analysis within that time?

This 0.1s is old thing. My point is that if you are compiling simple
change, than you should be able to do more in this time. In normal developement source file bigger than 10000 lines are relatively
rare, so once you get in range of 50000-100000 lines per second
making compiler faster is of marginal utility.

Yes. This may lead to some complexity. Simple approach is to
avoid obviously useless recompilation ('make' is doing this).
More complicated approach may keep some intermediate data and
try to "validate" them first. If previous analysis is valid,
then it can be reused. If something significant changes, than
it needs to be re-done. But many changes only have very local
effect, so at least theoretically re-using analyses could
save substantial time.

I consider compilation: turning textual source code into a form that can
be run, typically binary native code, to be a completely routine task
that should be as simple and as quick as flicking a light switch.

While anything else that might be a deep analysis of that program I
consider to be a quite different task. I'm not saying there is no place
for it, but I don't agree it should be integrated into every compiler
and always invoked.

We clearly differ in question of what is routine. Creating usable
executable is rare task, once executable is created it can be used
for long time. OTOH developement is routine and for this one wants
to know if a change is correct. Extra analyses and diagonstics
help here. And since normal developement works in cycles there is
a lot of possiblity to re-use results between cycles.

Since now that last statement is the '0' value (any int value wil do).
What should my compiler report instead? What analysis should it be
doing? What would that save me from typing?

Currently in typed language that I use literal translation of
the example hits a hole in checks, that is the code is accepted.

Concerning needed analyses: one thing needed is representation of
type, either Pascal range type or enumeration type (the example
is _very_ unatural because in modern programming magic numbers
are avoided and there would be some symbolic representation
adding meaning to the numbers). Second, compiler must recognize
that this is a "multiway switch" and collect conditions.

The example came from C. Even if written as a switch, C switches do not return values (and also are hard to even analyse as to which branch is which).

In my languages, switches can return values, and a switch written as the last statement of a function is considered to do so, even if each branch uses an explicit 'return'. Then, it will consider a missing ELSE a 'hole'.

It will not do any analysis of the range other than what is necessary to implement switch (duplicate values, span of values, range-checking when using jump tables).

So the language may require you to supply a dummy 'else x' or 'return
x'; so what?

The alternative appears to be one of:

* Instead of 'else' or 'return', to write 'unreachable', which puts some
trust, not in the programmer, but some person calling your function
who does not have sight of the source code, to avoid calling it with
invalid arguments

Already simple thing would be an improvement: make compiler aware of
error routine (if you do not have it add one) so that when you
signal error compiler will know that there is no need for normal
return value.

Once
you have such representation (which may be desirable for other
reasons) it is easy to determine set of handled values. More
precisely, in this example we just have small number of discrete
values. More ambitious compiler may have list of ranges.
If type also specifies list of values or list of ranges, then
it is easy to check if all values of the type are handled.

The types are tyically plain integers, with ranges from 2**8 to 2**64.
The ranges associated with application needs will be more arbitrary.

If talking about a language with ranged integer types, then there might
be more point to it, but that is itself a can of worms. (It's hard to do without getting halfway to implementing Ada.)

C has 'enum'. And a lot of languages treat such types much more
seriously than C.

You can't do this stuff with the compilers David Brown uses; I'm
guessing you can't do it with your prefered ones either.

To recompile the typed system I use (about 0.4M lines) on new fast
machine I need about 53s. But that is kind of cheating:
- this time is for parallel build using 20 logical cores
- the compiler is not in the language it compiles (but in untyped
vesion of it)
- actuall compilation of the compiler is small part of total
compile time
On slow machine compile time can be as large as 40 minutes.

40 minutes for 400K lines? That's 160 lines per second; how old is this machine? Is the compiler written in Python?

This is simple compiler doing rather complex analyses and time used by
them may grow exponentialy. Compiler is written in untyped version
of language it compiles and generates Lisp (so actual machine code
is generated by Lisp).

Concerning slowness, few years old Atoms are quite slow.

An untyped system that I use has about 0.5M lines and recompiles
itself in 16s on the same machine. This one uses single core.
On slow machine compile time may be closer to 2 minutes.

So 4K to 30Klps.

Closer to 50Klps, as there are other things taking time.

Again, compiler compile time is only a part of build time.
Actualy, one time-intensive part is creating index for included
documentation.

Which is not going to be part of a routine build.

In a sense build is not routine. Build is done for two purposes:
- to install working system from sources, that includes
documentaion
- to check that build works properly after changes, this also
should check documentaion build.

Normal developement goes without rebuilding the system.

Another is C compilation for a library file
(system has image-processing functions and low-level part of
image processing is done in C). Recomplation starts from
minimal version of the system, rebuilding this minimal
version takes 3.3s.

My language tools work on a whole program, where a 'program' is a single
EXE or DLL file (or a single OBJ file in some cases).

A 'build' then turns N source files into 1 binary file. This is the task
I am talking about.

I know. But this is not what I do. Build produces mutiple
artifacts, some of them executable, some are loadable code (but _not_
in form recogized by operating system), some essentially non-executable
(like documentation).

A complete application may have several such binaries and a bunch of
other stuff. Maybe some source code is generated by a script. This part
is open-ended.

However each of my current projects is a single, self-contained binary
by design.

Anyway, I do not need cascaded recompilation than you present.
Both system above have incermental compilation, the second one
at statement/function level: it offers interactive prompt
which takes a statement from the user, compiles it and immediately
executes. Such statement may define a function or perform compilation.
Even on _very_ slow machine there is no noticable delay due to
compilation, unless you feed the system with some oversized statement
or function (presumably from a file).

This sounds like a REPL system. There, each line is a new part of the program which is processed, executed and discarded.

First, I am writing about two different systems. Both have REPL.
Lines typed at REPL are "discarded", but their effect may last
long time.

In that regard, it
is not really what I am talking about, which is AOT compilation of a
program represented by a bunch of source files.

Untyped system is intended for "image based developement", you
compile bunch of routines to memory and dump the result to an
"image" file. You can load the image file later and use previously
compiled routines. This system also has second compiler which
outputs assembler file, and after using assembler you get object
file. If you insist compilation, assembly and linking can be
done by a single invocation of the compiler (which calls assembler
and linker behind the scene). But this is not normal use,
it is mainly used during system build to build base executable
which is later extended with extra functionality (like compilers
for extra languages) in saved images.

Typed system distingush "library compilation" and "user compilation".
"Library compilation" is done with module granularity and produces
loadable module.

Compilation is really AOT, you need to compile befor use.
Compiled functions may be replaced by new definitions, but in
absence of new definition compiled code is used without change.

Or can a new line redefine something, perhaps a function definition, previously entered amongst the last 100,000 lines? Can a new line
require compilation of something typed 50,000 lines ago?

What happens if you change the type of a global; are you saying that
none of the program codes needs revising?

In typed system there are no global "library" variables, all data
is encapsulated in modules and normally accessed in abstract way,
by calling apropriate functions. So, in "clean" code you
can recompile a single module and the whole system works.
There is potential trouble with user variables, if data layout
(representation) changes, old values will lead to trouble.
There is potential trouble if you remove exported function.
All previously compiled modules will assume that such function
is present and you will get runtime error when other modules
attempt to call such a function. For efficiency functions
from "core" modules may be inlined, if you make change to
of core modules you may need to recompile the whole system.
Similarly, some modules depend on structure of data in other
modules, if you change data layout you need to recompile
everything which depends on it (which as I wrote normally is
a single module, but may be more). In other words, if you
change data layout or module interfaces, than you may
need to recompile several modules. But during normal
developement this is much less frequent than changes which
affect only single module.

As an example, I changed representation of multidimensional arrays,
that required rebuild the whole system. OTOH most changes
are either bug fixes or replacing existing routine by a faster
one or adding new functionality. In those 3 cases there is
no change in interface seen by non-changed part. There are
also changes to module interfaces, those affect multiple
modules, but are less frequent.

Untyped (or if you prefer dynamicaly typed) system just acts
on what is in variables, if you put nonsense there you will
get error or possibly crash.

An untyped system

What do you mean by an untyped system? To me it usually means
dynamically typed.

Well, "untyped" is shorter and in a sense more relevant for
compiler. '+' is treated as a function call to a function
named '+' which performs actual work starting from dispatch
on type tags. OTOH 'fi_+' assume that it is given (tagged)
integers and is compiled to inline code which in case when
one argument is a constant may reduce to one or zero instructions
(zero instructions means that addition may be done as part
of address mode of load or store). At even lower level
there is '_add' which adds two things treating them as
machine integers.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Wed Nov 20 10:41:34 2024

Bart <bc@freeuk.com> wrote:

On 19/11/2024 15:51, Bart wrote:

On 19/11/2024 01:53, Waldek Hebisch wrote:

Another example, building 40Kloc interpreter from source then running it
in memory:

� c:\qx>tm \bx\mm -run qq hello
� Compiling qq.m to memory
� Hello, World! 19-Nov-2024 15:38:47
� TM: 0.11

� c:\qx>tm qq hello
� Hello, World! 19-Nov-2024 15:38:49
� TM: 0.05

The second version runs a precompiled EXE. So building from source added
only 90ms.

Sorry, that should be 60ms. Running that interpreter from source only
takes 1/16th of a second longer not 1/11th of a second.

BTW I didn't remark on the range of your (WH's) figures. They spanned 40 minutes for a build to instant, but it's not clear for which languages
they are, which tools are used and which machines. Or how much work they
have to do to get those faster times, or what work they don't do: I'm guessing it's not processing 0.5M lines for that fastest time.

As I wrote, there are 2 different system, if interesed you can fetch
them from github. Build time is just running make, one (typed
system) was

time make -j 20 > mlogg 2>&1

so build used up to 20 jobs, output went to a file (I am not sure
if it was important in this case, but there is 15MB of messages
and terminal emulator could take some time to print them).
Of course, this after all dependencies were installed and after
running 'configure'. Note that parallel build saves substantial
time, otherwise it probably would be somewhat more than 6 minutes.

For untyped system it was

time make > mlogg 2>&1

Shortest time was

time make stamp_new_corepop > mlogg3 2>&1

this rebuild only one crucial binary (that involves about 100K wc
lines). This is mixed language project, there is runtime support in
C (hard to say how much as a single file contains functions for
several OS-es but conditionals choose only one OS), assembler files
which are macro-processed and passed to assembler. There are
header files which are included during multiple compilations.

My point was that with machines available to me and with my
developement process "full build" time is not a problem.
With typed system normal thing is to rebuild a single module, and
for some modules it takes several seconds (most are of order of
a second). It would be nice to have faster compile time.
OTOH my "think time" frequently is much longer than this,
so compiler doing less checking could lead to longer time
overall.

So it was hard to formulate a response.

All my timings are either for C or my systems language, running on one
core on the same PC.

I do not think I will use your system language. And for C compiler
at least currently it does not make big difference to me if your
compiler can do 1Mloc or 5Mloc on my machine, both are "pretty fast".
What matters more is support of debugging output, supporting
targets that I need (like ARM or Risc-V), good diagnostics
and optimization. I recently installed TinyC on small Risc-V
machine, I think that available memory (64MB all, about 20MB available
to user programs) is too small to run gcc or clang.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Wed Nov 20 11:16:50 2024

On 19/11/2024 22:40, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

It is related: both gcc anf LLVM are doing analyses that in the
past were deemed inpracticaly expensive (both in time and in space).
Those analyses work now thanks to smart algorithms that
significantly reduced resource usage. I know that you consider
this too expensive.

How long would LLVM take to compile itself on one core? (Here I'm not
even sure what LLVM is; it you download the binary, it's about 2.5GB,
but a typical LLVM compiler might 100+ MB. But I guess it will be while
in either case.)

I have product now that is like a mini-LLVM backend. It can build into a standalone library of under 0.2MB, which can directy produce EXEs, or it
can interpret. Building that product from scratch takes 60ms.

That is my kind of product

What's the context of this 0.1 seconds? Do you consider it long or short?

Context is interactive response. It means "pretty fast for interactive
use".

It's less than the time to press and release the Enter key.

My tools can generally build my apps from scratch in 0.1 seconds; big
compilers tend to take a lot longer. Only Tiny C is in that ballpark.

So I'm failing to see your point here. Maybe you picked up that 0.1
seconds from an earlier post of mine and are suggesting I ought to be
able to do a lot more analysis within that time?

This 0.1s is old thing. My point is that if you are compiling simple
change, than you should be able to do more in this time. In normal developement source file bigger than 10000 lines are relatively
rare, so once you get in range of 50000-100000 lines per second
making compiler faster is of marginal utility.

I *AM* doing more in that time! It just happens to be stuff you appear
to have no interest in:

* I write whole-program compilers: you always process all source files
of an application. The faster the compiler, the bigger the scale of app
it becomes practical on.

* That means no headaches with dependencies (it goes in hand with a
decent module scheme)

* I can change one tiny corner of a the program, say add an /optional/ argument to a function, which requires compiling all call-sites across
the program, and the next compilation will take care of everything

* If I were to do more with optimisation (there is lots that can be done without getting into the heavy stuff), it automatically applies to the
whole program

* I can choose to run applications from source code, without generating discrete binary files, just like a script language

* I can choose (with my new backend) to interpret programs in this
static language. (Interpretation gives better debugging opportunities)

* I don't need to faff around with object files or linkers

Module-based independent compilation and having to link 'object files'
is stone-age stuff.

We clearly differ in question of what is routine. Creating usable
executable is rare task, once executable is created it can be used
for long time. OTOH developement is routine and for this one wants
to know if a change is correct.

I take it then that you have some other way of doing test runs of a
program without creating an executable?

It's difficult to tell from your comments.

Already simple thing would be an improvement: make compiler aware of
error routine (if you do not have it add one) so that when you
signal error compiler will know that there is no need for normal
return value.

OK, but what does that buy me? Saving a few bytes for a return
instruction in a function? My largest program, which is 0.4MB, already
only occupies 0.005% of the machines 8GB.

Which is not going to be part of a routine build.

In a sense build is not routine. Build is done for two purposes:
- to install working system from sources, that includes
documentaion
- to check that build works properly after changes, this also
should check documentaion build.

Normal developement goes without rebuilding the system.

We must be talking at cross-purposes then.

Either you're developing using interpreted code, or you must have some
means of converting source code to native code, but for some reason you
don't use 'compile' or 'build' to describe that process.

Or maybe your REPL/incremental process can run for days doing
incremental changes without doing a full compile. It seems quite mysterious.

I might run my compiler hundreds of times a day (at 0.1 seconds a time,
600 builds would occupy one whole minute in the day!). I often do it for frivolous purposes, such as trying to get some output lined up just
right. Or just to make sure something has been recompiled since it's so
quick it's hard to tell.

I know. But this is not what I do. Build produces mutiple
artifacts, some of them executable, some are loadable code (but _not_
in form recogized by operating system), some essentially non-executable
(like documentation).

So, 'build' means something different to you. I use 'build' just as a
change from writing 'compile'.

This sounds like a REPL system. There, each line is a new part of the
program which is processed, executed and discarded.

First, I am writing about two different systems. Both have REPL.
Lines typed at REPL are "discarded", but their effect may last
long time.

My last big app used a compiled core but most user-facing functionality
was done using an add-on script language. This meant I could develop
such modules from within a working application, which provided a rich, persistent environment.

Changes to the core program required a rebuild and a restart.

However the whole thing was an application, not a language.

What happens if you change the type of a global; are you saying that
none of the program codes needs revising?

In typed system there are no global "library" variables, all data
is encapsulated in modules and normally accessed in abstract way,
by calling apropriate functions. So, in "clean" code you
can recompile a single module and the whole system works.

I used module-at-time compilation until 10-12 years ago. The module
scheme had to be upgraded at the same time, but it took several goes to
get it right.

Now I wouldn't go back. Who cares about compiling a single module that
may or may not affect a bunch of others? Just compile the lot!

If a project's scale becomes too big, then it should be split into
independent program units, for example a core EXE file and a bunch of
DLLs; that's the new granularity. Or a lot of functionality can be
off-loaded to scripts, as I used to do.

(My scripting language code still needs bytecode compilation, and I also
use whole-program units there, but the bytecode compiler goes up to 2Mlps.)

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Wed Nov 20 12:33:09 2024

On 19/11/2024 23:41, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

BTW I didn't remark on the range of your (WH's) figures. They spanned 40
minutes for a build to instant, but it's not clear for which languages
they are, which tools are used and which machines. Or how much work they
have to do to get those faster times, or what work they don't do: I'm
guessing it's not processing 0.5M lines for that fastest time.

As I wrote, there are 2 different system, if interesed you can fetch
them from github.

Do you have a link? Probably I won't attempt to build but I can see what
it looks like.

I do not think I will use your system language. And for C compiler
at least currently it does not make big difference to me if your
compiler can do 1Mloc or 5Mloc on my machine, both are "pretty fast".
What matters more is support of debugging output, supporting
targets that I need (like ARM or Risc-V), good diagnostics
and optimization.

It's funny how nobody seems to care about the speed of compilers (which
can vary by 100:1), but for the generated programs, the 2:1 speedup you
might get by optimising it is vital!

Here I might borrow one of your arguments and suggest such a speed-up is
only necessary on a rare production build.

I recently installed TinyC on small Risc-V
machine, I think that available memory (64MB all, about 20MB available
to user programs) is too small to run gcc or clang.

Only 20,000KB? My first compilers worked on 64KB systems, not all of
which was available either.

None of my recent products will do so now, but they will still fit on a
floppy disk.

BTW why don't you use a cross-compiler? That's what David Brown would say.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Dan Purgert@3:633/280.2 to All on Wed Nov 20 23:31:35 2024

On 2024-11-16, Stefan Ram wrote:

Dan Purgert <dan@djph.net> wrote or quoted:

if (n==0) { printf ("n: %u\n",n); n++;}
if (n==1) { printf ("n: %u\n",n); n++;}
if (n==2) { printf ("n: %u\n",n); n++;}
if (n==3) { printf ("n: %u\n",n); n++;}
if (n==4) { printf ("n: %u\n",n); n++;}
printf ("all if completed, n=%u\n",n);

My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!

I honestly lost the plot ages ago; not sure if it was either!

In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".

Segfaults? :D

Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.

#include <stdio.h>

const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }

void print_english( int const n )
{ printf( "%s\n", english( n )); }

int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }

oooh, that's way better at making a point of the hazard than mine was.

.... almost needed to engage my rubber duckie, before I realized I was
mentally auto-correcting the 'english()' function while reading it.

--
|_|O|_|
|_|_|O| Github: https://github.com/dpurgert
|O|O|O| PGP: DDAB 23FB 19FA 7D85 1CC1 E067 6D65 70E5 4CE7 2860

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Scott Lurndal@3:633/280.2 to All on Thu Nov 21 00:42:14 2024

Reply-To: slp53@pacbell.net

Bart <bc@freeuk.com> writes:

On 19/11/2024 23:41, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

It's funny how nobody seems to care about the speed of compilers (which
can vary by 100:1), but for the generated programs, the 2:1 speedup you >might get by optimising it is vital!

I don't consider it funny at all, rather it is simply the way things
should be. One compiles once. One's customer runs the resulting
executable perhaps millions of times.

Here I might borrow one of your arguments and suggest such a speed-up is >only necessary on a rare production build.

And again, you've clearly never worked with any significantly
large project. Like for instance an operating system.

I recently installed TinyC on small Risc-V
machine, I think that available memory (64MB all, about 20MB available
to user programs) is too small to run gcc or clang.

Only 20,000KB? My first compilers worked on 64KB systems, not all of
which was available either.

My first compilers worked on 4KW PDP-8. Not that I have any
interest in _ever_ working in such a constrained environment
ever again.

None of my recent products will do so now, but they will still fit on a >floppy disk.

And, nobody cares.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Thu Nov 21 00:44:08 2024

Bart <bc@freeuk.com> wrote:

On 19/11/2024 23:41, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

BTW I didn't remark on the range of your (WH's) figures. They spanned 40 >>> minutes for a build to instant, but it's not clear for which languages
they are, which tools are used and which machines. Or how much work they >>> have to do to get those faster times, or what work they don't do: I'm
guessing it's not processing 0.5M lines for that fastest time.

As I wrote, there are 2 different system, if interesed you can fetch
them from github.

Do you have a link? Probably I won't attempt to build but I can see what
it looks like.

I do not think I will use your system language. And for C compiler
at least currently it does not make big difference to me if your
compiler can do 1Mloc or 5Mloc on my machine, both are "pretty fast".
What matters more is support of debugging output, supporting
targets that I need (like ARM or Risc-V), good diagnostics
and optimization.

It's funny how nobody seems to care about the speed of compilers (which
can vary by 100:1), but for the generated programs, the 2:1 speedup you might get by optimising it is vital!

Here I might borrow one of your arguments and suggest such a speed-up is only necessary on a rare production build.

Well, there are some good arguments for using optimizing compulation
during developement:
- test what will be deliverd
- in gcc important diagnostics like info about uninitialized variables
are available only when you turn on optimization
- with separate compilation compile time usually is acceptable

I have some extra factors:
- C files on which I am doing developement are frequently quite
small and compile time is reasonable
- C code is usually in slowly changing base part and is recompiled
only rarely

I recently installed TinyC on small Risc-V
machine, I think that available memory (64MB all, about 20MB available
to user programs) is too small to run gcc or clang.

Only 20,000KB? My first compilers worked on 64KB systems, not all of
which was available either.

I used compilers on ZX Spectrum, so I know that compiler is possible
on such a machine. More to the point, gcc-1.42 worked quite well
in 4MB machine, at that time 20MB would be quite big and could support
several users doing compilation. But porting gcc-1.42 to Risc-V
is more work that I am willing to do (at least now, I could do this
if I get infinite amount of free time).

None of my recent products will do so now, but they will still fit on a floppy disk.

BTW why don't you use a cross-compiler? That's what David Brown would say.

I did use cross-compiler to compile TinyC. Sometimes native compiler
is more convenient, I have non-C code which is hard to cross-build
and I need to link this code with C code. In cases like this doing
everthing natively is simplest thing to do (some folks use emulators,
but when it works native build is simpler). Second, one reason
to build natively is to test that native build works. In early
days of Linux I tried few times to recompile C library, and my
trials failed. Later I learned that at that time Linux C library
for i386 was cross-compiled on a Sparc machine. Apparently native
build was not tested and tended to fail. To be clear: that was long
ago, AFAIK now C library is build natively and IIRC I recompiled
it few times (I rarely have reason to do this). Third reason
to have native compiler is that machines of this class used to
come with C compiler, it was a shame not to have any C compiler
there, so I got one...

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Thu Nov 21 01:21:35 2024

On 20/11/2024 13:42, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 19/11/2024 23:41, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

It's funny how nobody seems to care about the speed of compilers (which
can vary by 100:1), but for the generated programs, the 2:1 speedup you
might get by optimising it is vital!

I don't consider it funny at all, rather it is simply the way things
should be. One compiles once.

Hmm, someone else who develops software, either without needing to
compile code in order to test it, or they write a 1M-line app and it
compiles and runs perfectly first time!

Sounds like some more gaslighting going on: people develop huge
applications, using slow, cumbersome compilers where max optimisations
are permanently enabled, and yet they have instant edit-compile-run
cycles or they apparently don't need to bother with a compiler at all!

One's customer runs the resulting
executable perhaps millions of times.

Sure. That's when you run a production build. I can even do that myself
on some programs (the ones where my C transpiler still works) and pass
it through gcc-O3. Then it might run 30% faster.

However, each of the 1000s of compilations before that point are pretty
much instant.

Here I might borrow one of your arguments and suggest such a speed-up is
only necessary on a rare production build.

And again, you've clearly never worked with any significantly
large project. Like for instance an operating system.

No. And? That's like telling somebody who likes to devise their own
bicycles that they've never worked on a really large conveyance, like a
jumbo jet. Unfortunately a bike as big, heavy, expensive and cumbersome
as an airliner is not really practical.

Besides, in the 1980s the tools and apps I did write were probably
larger than the OS. All I can remember is that the OS provided a file
system and a text display to allow you to launch the application you
really wanted.

The funny is that it is with large projects that edit-compile-run
turnaround times become more significant. I've heard horror-stories of
such builds taking minutes or even hours. But everybody here seems to
have found some magic workaround where compilation times even on -O3
don't matter at all.

machine, I think that available memory (64MB all, about 20MB available
to user programs) is too small to run gcc or clang.

Only 20,000KB? My first compilers worked on 64KB systems, not all of
which was available either.

My first compilers worked on 4KW PDP-8. Not that I have any
interest in _ever_ working in such a constrained environment
ever again.

There could be some lessons to be learned however. Since the amount of
bloat now around is becoming ridiculous.

None of my recent products will do so now, but they will still fit on a
floppy disk.

And, nobody cares.

You obviously don't.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Thu Nov 21 01:38:57 2024

Bart <bc@freeuk.com> wrote:

On 19/11/2024 22:40, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

It is related: both gcc anf LLVM are doing analyses that in the
past were deemed inpracticaly expensive (both in time and in space).
Those analyses work now thanks to smart algorithms that
significantly reduced resource usage. I know that you consider
this too expensive.

How long would LLVM take to compile itself on one core? (Here I'm not
even sure what LLVM is; it you download the binary, it's about 2.5GB,
but a typical LLVM compiler might 100+ MB. But I guess it will be while
in either case.)

I do not know, but I would expect some hours. I did compile not
so recent gcc version, it was 6.5 min clock time, about 70 min
CPU time. Recent gcc is bigger and LLVM is of comparable size.

I have product now that is like a mini-LLVM backend. It can build into a standalone library of under 0.2MB, which can directy produce EXEs, or it
can interpret. Building that product from scratch takes 60ms.

That is my kind of product

What's the context of this 0.1 seconds? Do you consider it long or short? >>

Context is interactive response. It means "pretty fast for interactive
use".

It's less than the time to press and release the Enter key.

My tools can generally build my apps from scratch in 0.1 seconds; big
compilers tend to take a lot longer. Only Tiny C is in that ballpark.

So I'm failing to see your point here. Maybe you picked up that 0.1
seconds from an earlier post of mine and are suggesting I ought to be
able to do a lot more analysis within that time?

This 0.1s is old thing. My point is that if you are compiling simple
change, than you should be able to do more in this time. In normal
developement source file bigger than 10000 lines are relatively
rare, so once you get in range of 50000-100000 lines per second
making compiler faster is of marginal utility.

I *AM* doing more in that time! It just happens to be stuff you appear
to have no interest in:

* I write whole-program compilers: you always process all source files
of an application. The faster the compiler, the bigger the scale of app
it becomes practical on.

* That means no headaches with dependencies (it goes in hand with a
decent module scheme)

* I can change one tiny corner of a the program, say add an /optional/ argument to a function, which requires compiling all call-sites across
the program, and the next compilation will take care of everything

* If I were to do more with optimisation (there is lots that can be done without getting into the heavy stuff), it automatically applies to the
whole program

* I can choose to run applications from source code, without generating discrete binary files, just like a script language

* I can choose (with my new backend) to interpret programs in this
static language. (Interpretation gives better debugging opportunities)

* I don't need to faff around with object files or linkers

Module-based independent compilation and having to link 'object files'
is stone-age stuff.

I am not aware of a computer made from stone (silcon is product of
quite advanced metalurgy). And while you have aversion to object
files you wrote that you do independent compilation. Only you
insist that result of independent compilation must be a DLL.
How this is different from folks that compile each module to
a separate DLL?

We clearly differ in question of what is routine. Creating usable
executable is rare task, once executable is created it can be used
for long time. OTOH developement is routine and for this one wants
to know if a change is correct.

I take it then that you have some other way of doing test runs of a
program without creating an executable?

It's difficult to tell from your comments.

Already simple thing would be an improvement: make compiler aware of
error routine (if you do not have it add one) so that when you
signal error compiler will know that there is no need for normal
return value.

OK, but what does that buy me? Saving a few bytes for a return
instruction in a function? My largest program, which is 0.4MB, already
only occupies 0.005% of the machines 8GB.

What it buys is clear expressin of intent, easily checkable by the compiler/runtime. That is when you do not signal error compiler
will complain. And if you hit such case at runtime due to a bug
you will have clear info.

Which is not going to be part of a routine build.

In a sense build is not routine. Build is done for two purposes:
- to install working system from sources, that includes
documentaion
- to check that build works properly after changes, this also
should check documentaion build.

Normal developement goes without rebuilding the system.

We must be talking at cross-purposes then.

Either you're developing using interpreted code, or you must have some
means of converting source code to native code, but for some reason you don't use 'compile' or 'build' to describe that process.

Or maybe your REPL/incremental process can run for days doing
incremental changes without doing a full compile.

Yes.

It seems quite mysterious.

There is nothing misterious here. In typed system each module has
a vector (one dimensional array) called domain vector containg amoung
other references to called function. All inter-module calls are
indirect ones, they take thing to call from the domain vector. When
module starts execution references point to a runtime routine doing
similar work to dynamic linker. The first call goes to runtime
support routine which finds needed code and replaces reference in
the domain vector.

When a module is recompiled references is domain vectors are
reinitialized to point to runtimne. So searches are run again
and if needed pick new routine.

Note that there is a global table keeping info (including types)
about all exported routines from all modules. This table is used
when compileing a module and also by the search process at runtime.

The effect is that after recompilation of a single module I have
runnuble executable in memory including code of the new module.
If you wonder about compiling the same module many times: system
has garbage collector and unused code is garbage collected.
So, when old version is replaced by new one the old becomes a
garbage and will be collected in due time.

The other system is similar in principle, but there is no need
for runtime search and domain vectors.

I might run my compiler hundreds of times a day (at 0.1 seconds a time,
600 builds would occupy one whole minute in the day!). I often do it for frivolous purposes, such as trying to get some output lined up just
right. Or just to make sure something has been recompiled since it's so quick it's hard to tell.

I know. But this is not what I do. Build produces mutiple
artifacts, some of them executable, some are loadable code (but _not_
in form recogized by operating system), some essentially non-executable
(like documentation).

So, 'build' means something different to you. I use 'build' just as a
change from writing 'compile'.

Build means creating new fully-functional system. That involves
possibly multiple compilations and whatever else is needed.

This sounds like a REPL system. There, each line is a new part of the
program which is processed, executed and discarded.

First, I am writing about two different systems. Both have REPL.
Lines typed at REPL are "discarded", but their effect may last
long time.

My last big app used a compiled core but most user-facing functionality
was done using an add-on script language. This meant I could develop
such modules from within a working application, which provided a rich, persistent environment.

Changes to the core program required a rebuild and a restart.

However the whole thing was an application, not a language.

Well, the typed system is an application, which however offers
extention language and majority of application code is written
in this language. And this language is compiled, first to Lisp
and then Lisp to machine code (some Lisp compilers compile to
bytecode, some compile via C but it is best to use Lisp compiler
compiling Lisp directly to machine code).

The second system is four languages + collection of "standard"
routines. There is significantly more than just compiler
(for example text editor with capability to send e-mail),
but languages are at the center.

What happens if you change the type of a global; are you saying that
none of the program codes needs revising?

In typed system there are no global "library" variables, all data
is encapsulated in modules and normally accessed in abstract way,
by calling apropriate functions. So, in "clean" code you
can recompile a single module and the whole system works.

I used module-at-time compilation until 10-12 years ago. The module
scheme had to be upgraded at the same time, but it took several goes to
get it right.

Now I wouldn't go back. Who cares about compiling a single module that
may or may not affect a bunch of others? Just compile the lot!

If a project's scale becomes too big, then it should be split into independent program units, for example a core EXE file and a bunch of
DLLs; that's the new granularity. Or a lot of functionality can be off-loaded to scripts, as I used to do.

(My scripting language code still needs bytecode compilation, and I also
use whole-program units there, but the bytecode compiler goes up to 2Mlps.)

In both cases spirit is similar to scripting languages. Just
languages are compiled to machine code and have features supporting
large scale programming.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Thu Nov 21 01:49:08 2024

Bart <bc@freeuk.com> wrote:

On 19/11/2024 23:41, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

BTW I didn't remark on the range of your (WH's) figures. They spanned 40 >>> minutes for a build to instant, but it's not clear for which languages
they are, which tools are used and which machines. Or how much work they >>> have to do to get those faster times, or what work they don't do: I'm
guessing it's not processing 0.5M lines for that fastest time.

As I wrote, there are 2 different system, if interesed you can fetch
them from github.

Do you have a link? Probably I won't attempt to build but I can see what
it looks like.

Forgot to put links in another message:

https://github.com/fricas/fricas

and

https://github.com/hebisch/poplog

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Thu Nov 21 03:00:52 2024

On 19.11.2024 18:31, David Brown wrote:

[...]

All I have been arguing against is the idea of blindly putting in
validity tests for parameters in functions, as though it were a habit
that by itself leads to fewer bugs in code.

Fair enough.

Janis

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Thu Nov 21 03:15:20 2024

On 20/11/2024 02:33, Bart wrote:

On 19/11/2024 23:41, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

I do not think I will use your system language.� And for C compiler
at least currently it does not make big difference to me if your
compiler can do 1Mloc or 5Mloc on my machine, both are "pretty fast".
What matters more is support of debugging output, supporting
targets that I need (like ARM or Risc-V), good diagnostics
and optimization.

It's funny how nobody seems to care about the speed of compilers (which
can vary by 100:1), but for the generated programs, the 2:1 speedup you might get by optimising it is vital!

To understand this, you need to understand the benefits of a program
running quickly. Let's look at the main ones:

1. If it is a run-to-finish program, it will finish faster, and you have
less time waiting for it. A compiler will fall into this category.

2. If it is a run-continuously (or run often) program, it will use a
smaller proportion of the computer's resources, less electricity, less
heat generated, less fan noise, etc. That covers things like your email client, or your OS - things running all the time.

3. If it is a dedicated embedded system, faster programs can mean
smaller, cheaper, and lower power processors or microcontrollers for the
given task. That applies to the countless embedded systems that
surround us (far outweighing the number of "normal" computers), and the devices I make.

4. For some programs, running faster means you can have higher quality
in a similar time-frame. That applies to things like simulators, static analysers, automatic test coverage setups, and of course games.

5. For interactive programs, running faster makes them nicer to use.

There is usually a point where a program is "fast enough" - going faster
makes no difference. No one is ever going to care if a compilation
takes 1 second or 0.1 seconds, for example.

It doesn't take much thought to realise that for most developers, the
speed of their compiler is not actually a major concern in comparison to
the speed of other programs. And for everyone other than developers, it
is of no concern at all.

While writing code, and testing and debugging it, a given build might
only be run a few times, and compile speed is a bit more relevant.
Generally, however, most programs are run far more often, and for far
longer, than their compilation time. (If not, then you should most
likely have used a higher level language instead of a compiled low-level language.) So compile time is relatively speaking of much lower
priority than the speed of the result.

I think it's clear that everyone prefers faster rather than slower. But generally, people want /better/ rather than just faster. One of the
factors of "better" for compilers is that the resulting executable runs faster, and that is certainly worth a very significant cost in compile time.

And as usual, you miss out the fact that toy compilers - like yours, or
TinyC - miss all the other features developers want from their tools. I
want debugging information, static error checking, good diagnostics,
support for modern language versions (that's primarily C++ rather than
C), useful extensions, compact code, correct code generation, and most importantly of all, support for the target devices I want. I wouldn't
care if your compiler can run at a billion lines per second and gcc took
an hour to compile - I still wouldn't be interested in your compiler
because it does not generate code for the devices I use. Even if it
did, it would be useless to me, because I can trust the code gcc
generates and I cannot trust the code your tool generates. And even if
your tool did everything else I need, and you could convince me that it
is something a professional could rely on, I'd still use gcc for the
better quality generated code, because that translates to money saved
for my customers.

BTW why don't you use a cross-compiler? That's what David Brown would say.

That is almost certainly what he normally does. It can still be fun to
play around with things like TinyC, even if it is of no practical use
for the real development.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Thu Nov 21 05:31:53 2024

David Brown <david.brown@hesbynett.no> wrote:

On 15/11/2024 19:50, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

On 11/11/2024 20:09, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

Concerning correct place for checks: one could argue that check
should be close to place where the result of check matters, which
frequently is in called function.

No, there I disagree. The correct place for the checks should be close
to where the error is, and that is in the /calling/ code. If the called >>> function is correctly written, reviewed, tested, documented and
considered "finished", why would it be appropriate to add extra code to
that in order to test and debug some completely different part of the code? >>>
The place where the result of the check /really/ matters, is the calling >>> code. And that is also the place where you can most easily find the
error, since the error is in the calling code, not the called function.
And it is most likely to be the code that you are working on at the time >>> - the called function is already written and tested.

And frequently check requires
computation that is done by called function as part of normal
processing, but would be extra code in the caller.

It is more likely to be the opposite in practice.

And for much of the time, the called function has no real practical way
to check the parameters anyway. A function that takes a pointer
parameter - not an uncommon situation - generally has no way to check
the validity of the pointer. It can't check that the pointer actually
points to useful source data or an appropriate place to store data.

All it can do is check for a null pointer, which is usually a fairly
useless thing to do (unless the specifications for the function make the >>> pointer optional). After all, on most (but not all) systems you already >>> have a "free" null pointer check - if the caller code has screwed up and >>> passed a null pointer when it should not have done, the program will
quickly crash when the pointer is used for access. Many compilers
provide a way to annotate function declarations to say that a pointer
must not be null, and can then spot at least some such errors at compile >>> time. And of course the calling code will very often be passing the
address of an object in the call - since that can't be null, a check in
the function is pointless.

Well, in a sense pointers are easy: if you do not play nasty tricks
with casts then type checks do significant part of checking. Of
course, pointer may be uninitialized (but compiler warnings help a lot
here), memory may be overwritten, etc. But overwritten memory is
rather special, if you checked that content of memory is correct,
but it is overwritten after the check, then earlier check does not
help. Anyway, main point is ensuring that pointed to data satisfies
expected conditions.

That does not match reality. Pointers are far and away the biggest
source of errors in C code. Use after free, buffer overflows, mixups of
who "owns" the pointer - the scope for errors is boundless. You are
correct that type systems can catch many potential types of errors - unfortunately, people /do/ play nasty tricks with type checks.
Conversions of pointer types are found all over the place in C
programming, especially conversions back and forth with void* pointers.

Well, I worked with gcc code. gcc has its own garbages collector,
so there were no ownership troubles or use after free. There were
some possibility of buffer overflows, but since most data structures
that I was using were trees or lists it was limited. gcc did use
casts, but those were mainly between between pointer to union and
pointers to variants. Unions had tag (at the same place in all
variants), there were accessor macros which checked that the tag
corresponds to expected variant. It certainly took some effort to
develp the gcc infrastructure, I just benefited from it. Earlier
versions of gcc did not have garbage collector (and probably also
did not have checking macros).

Also, you say that pointers are source of errors. In gcc source
usualy was bad semantics, that is some function did something
else than it should. This could manifest as a failed tag check
(IME most frequent case), segfault or wrong generated code.
And troblesome cases were the wrong code cases.

My personal codes were much smaller. In one production case
all allocated memory was put in a linked list and freed in
bulk at end of processing. In my embeded code I do not
use dynamic allocation. In other case C routines are called
from garbage collected language, so most or all pointers are
"owned" by garbage collected language and C routines should
not and can not free them. In still another cases pointer
usage follows relatively simple design pattern and is not
a problem.

You may have more tricky cases than the cases I handle using
manual memory management and can not (or do not want) use garbage
collector. I do not know how much checking infrastructure do
you have. Simply I reported my experience.and how I interpret
it: I may get a segfault, but segfault itself is a minor
trouble. In particular many segfaults can be corrected almost
immediately. Bigger trouble is when actual problem is logic
error. In non-C coding in garbage-collected language "pointer
errors" that you mention go away, but logic errors are still
there.

All this means that invalid pointer parameters are very much a real
issue - but are typically impossible to check in the called function.

In gcc you could get pointer to wrong variant of a union, but called
function could detect it looking at the tag. One could cast
a point to completely differnt type, but this would be gross
error which was rare.

The way you avoid getting errors in your pointers is being careful about having the right data in the first place, so you only call functions
with valid parameters. You do this by having careful control about the ownership and lifetime of pointers, and what they point to, keeping conventions in the names of your pointers and functions to indicate who
owns what, and so on. And you use sanitizers and similar tools during testing and debugging to distinguish between tests that worked by luck,
and ones that worked reliably. (And of course you may consider other languages than C that help you express your requirements in a clearer
manner or with better automatic checking.)

Yes, of course.

Put the same effort and due diligence into the rest of your code, and suddenly you find your checks for other kinds of parameters in functions
are irrelevant as you are now making sure you call functions with appropriate valid inputs.

It depends on the domain (also see below).

Once you get to more complex data structures, the possibility for the
caller to check the parameters gets steadily less realistic.

So now your practice of having functions "always" check their parameters >>> leaves the people writing calling code with a false sense of security -
usually you /don't/ check the parameters, you only ever do simple checks >>> that that called could (and should!) do if they were realistic. You've
got the maintenance and cognitive overload of extra source code for your >>> various "asserts" and other check, regardless of any run-time costs
(which are often irrelevant, but occasionally very important).

You will note that much of this - for both sides of the argument - uses
words like "often", "generally" or "frequently". It is important to
appreciate that programming spans a very wide range of situations, and I >>> don't want to be too categorical about things. I have already said
there are situations when parameter checking in called functions can
make sense. I've no doubt that for some people and some types of
coding, such cases are a lot more common than what I see in my coding.

Note also that when you can use tools to automate checks, such as
"sanitize" options in compilers or different languages that have more
in-built checks, the balance differs. You will generally pay a run-time >>> cost for those checks, but you don't have the same kind of source-level
costs - your code is still clean, clear, and amenable to correctness
checking, without hiding the functionality of the code in a mass of
unnecessary explicit checks. This is particularly good for debugging,
and the run-time costs might not be important. (But if run-time costs
are not important, there's a good chance that C is not the best language >>> to be using in the first place.)

Our experience differs. As a silly example consider a parser
which produces parse tree. Caller is supposed to pass syntactically
correct string as an argument. However, checking syntactic corretnetness
requires almost the same effort as producing parse tree, so it
ususal that parser both checks correctness and produces the result.

The trick here is to avoid producing a syntactically invalid string in
the first place. Solve the issue at the point where there is a mistake
in the code!

(If you are talking about a string that comes from outside the code in
some way, then of course you need to check it - and if that is most conveniently done during the rest of parsing, then that is fair enough.)

Imagne about 1000 modules containing about 15000 functions. The
modules for a library and any exportd function (about 7000) is
potentially user-accessible. Function transform data and do not
know where their argument came from: user or other library
function. Processing in principle is quite well defined, so
one could formulate validity conditions for inputs and outputs.
But the conditions do no compose in a simple way. More precisly,
in many cases when given function received correct data and is
doing right thing, then all functions it calls will receive
correct arguments. But trouble is, what if the function is
wrong? Natural answer: write correct code solves nothing.
Of course, one makes effort to write correct code, but bugs
still appear. So, there are internal checks. And failing
check frequently is in called function, because it can
detect error. Of course, if detecting error in caller
were easy, the caller would do the check. But frequently
it is not easy. Look at partially made up example.
We have a mathematical problem that could be transformed to
solving linear equations. In general, a system of linear
equations may have no solution. But one may be able to
prove that that equations coming from a specific problem
are always solvable. So we write a routine that transforms
input into a system of linear equations. Equation solver
returns information if system is solvable and in case of
solvable system also description of solutions. Taking
literaly your advice, we would just access solutions
(we proved that system is solvable so solutions must be
there!). But in system I use and develop as written
one can not "just access solutions" without first checking
(explicitely or implicitely) return value for possibility
of no solution. And what happens when there is no solution?
Implicit check will signal error and if check is explicit
the only sensible thing to do is also signaling error.
My point here that there is natural place to put extra
check. If the check fails you know that there is a bug
(or possibly the input data was wrong). And if there is
a bug that is the earliest practical place to discover it.

BTW: While I did not give complete example, this is frequent
approach to solving math problem.

I have computations that are quite different than parsing but
in some cases share the same characteristic: checking correctness of
arguments requires complex computation similar to producing
actual result. More freqently, called routine can check various
invariants which with high probablity can detect errors. Doing
the same check in caller is inpractical.

I think you are misunderstanding me - maybe I have been unclear. I am saying that it is the /caller's/ responsibility to make sure that the parameters it passes are correct, not the /callee's/ responsibility.
That does not mean that the caller has to add checks to get the
parameters right - it means the caller has to use correct parameters.

In this sense I agree. Simply life shows that checks are needed
and there are frequently natural places to put checks. And frequently
those natural places are far from origin of the data.

Think of this like walking near a cliff-edge. Checking parameters
before the call is like having a barrier at the edge of the cliff. My recommendation is that you know where the cliff edge is, and don't walk there.

That is easy case. The worst problems are ones where you do not
know that there is cliff edge. With real cliff edge once you fall
the trouble will be obvious (either to you or to people who find you).
In programming you may be getting wrong results and do not know
this possibly making problem worse. I simply advocate early
detection of troubles.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Thu Nov 21 07:17:39 2024

On 20/11/2024 16:15, David Brown wrote:

On 20/11/2024 02:33, Bart wrote:

It's funny how nobody seems to care about the speed of compilers
(which can vary by 100:1), but for the generated programs, the 2:1
speedup you might get by optimising it is vital!

To understand this, you need to understand the benefits of a program
running quickly.

As I said, people are preoccupied with that for programs in general. But
when it comes to compilers, it doesn't apply! Clearly, you are implying
that those benefits don't matter when the program is a compiler.

� Let's look at the main ones:

<snip>

OK. I guess you missed the bits here and in another post, where I
suggested that enabling optimisation is fine for production builds.

For the routines ones that I do 100s of times a day, where test runs are generally very short, then I don't want to hang about waiting for a
compiler that is taking 30 times longer than necessary for no good reason.

There is usually a point where a program is "fast enough" - going faster makes no difference.� No one is ever going to care if a compilation
takes 1 second or 0.1 seconds, for example.

If you look at all the interactions people have with technology, with
GUI apps, even with mechanical things, a 1 second latency is generally disastrous.

A one-second delay between pressing a key and seeing a character appear
on a display or any other feedback, would drive most people up to wall.
But 0.1 is perfectly fine.

It doesn't take much thought to realise that for most developers, the
speed of their compiler is not actually a major concern in comparison to
the speed of other programs.

Most developers are stuck with what there is. Naturally they will make
the best of it. Usually by finding 100 ways or 100 reasons to avoid
running the compiler.

While writing code, and testing and debugging it, a given build might
only be run a few times, and compile speed is a bit more relevant. Generally, however, most programs are run far more often, and for far longer, than their compilation time.

Developing code is the critical bit.

Even when a test run takes a bit longer as you need to set things up,
when you do need to change something and run it again, you don't want
any pointless delay.

Neither do you want to waste /your/ time pandering to a compiler's
slowness by writing makefiles and defining dependencies. Or even
splitting things up into tiny modules. I don't want to care about that
at all. Here's my bunch of source files, just build the damn thing, and
do it now!

And as usual, you miss out the fact that toy compilers - like yours, or TinyC - miss all the other features developers want from their tools.� I want debugging information, static error checking, good diagnostics,
support for modern language versions (that's primarily C++ rather than
C), useful extensions, compact code, correct code generation, and most importantly of all, support for the target devices I want.

Sure. But then I'm sure you're aware that most scripting languages
include a compilation stage where source code might be translated to
bytecode.

I guess you're OK with that being as fast as possible so that there is
no noticeable delay. But I also guess that all those features go out the window, yet people don't seem to care in that case.

My whole-program compilers (even my C one now) can run programs from
source code just a like a scripting language.

So a fast, mechanical compiler than does little checking is good in one
case, but not in another (specifically, anything created by Bart).

� I wouldn't
care if your compiler can run at a billion lines per second and gcc took
an hour to compile - I still wouldn't be interested in your compiler
because it does not generate code for the devices I use.� Even if it
did, it would be useless to me, because I can trust the code gcc
generates and I cannot trust the code your tool generates.

Suppose I had a large C source file, mechanically generated via a
compiler from another language so that it was fully verified.

It took a fraction of a second to generate it, all that's needed is a mechanical translation to native code. In that case you can keep your
compiler that takes one hour to do analyses I don't need; I'll take the million line per second one. (A billion lines is not viable, one million
is.)

� And even if
your tool did everything else I need, and you could convince me that it
is something a professional could rely on, I'd still use gcc for the
better quality generated code, because that translates to money saved
for my customers.

Where have I said you should use my compiler? I'm simply making a case
for the existence of very fast, baseline tools that do the minimum
necessary with as little effort or footprint as necessary.

Here's an interesting test: I took sql.c (a 250Kloc sqlite3 test
program), and compiled it first to NASM-compatible assembly, and then to
my own assembly code.

I compiled the latter with my assembler and it took 1/6th for a second
(for some 0.3M lines).

How long do you think NASM took? It was nearly 8 minutes. Or a blazing
5 minutes if you used -O0 (do only one pass).

No doubt you will argue that NASM is superior to my product, although
I'm not sure how much deep analysis you can do of assembly code. And you
will castigate me for giving it over-large inputs. However that is the
task that needs to be done here.

It clearly has a bug, but if I hadn't mentioned it, I'd like to have
known how sycophantic you would have been towards that product just to
be able to belittle mine.

The NASM bug only starts to become obvious above 20Kloc or so. I wonder
how many more subtle bugs exist in big products that result in
significantly slower performance, but are not picked up because people
like you /don't care/. You will just buy a faster machine or chop your application up into even smaller bits.

BTW why don't you use a cross-compiler? That's what David Brown would
say.

That is almost certainly what he normally does.� It can still be fun to
play around with things like TinyC, even if it is of no practical use
for the real development.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Thu Nov 21 10:29:44 2024

On 20/11/2024 14:38, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

Either you're developing using interpreted code, or you must have some
means of converting source code to native code, but for some reason you
don't use 'compile' or 'build' to describe that process.

Or maybe your REPL/incremental process can run for days doing
incremental changes without doing a full compile.

Yes.

It seems quite mysterious.

There is nothing misterious here. In typed system each module has
a vector (one dimensional array) called domain vector containg amoung
other references to called function. All inter-module calls are
indirect ones, they take thing to call from the domain vector. When
module starts execution references point to a runtime routine doing
similar work to dynamic linker. The first call goes to runtime
support routine which finds needed code and replaces reference in
the domain vector.

When a module is recompiled references is domain vectors are
reinitialized to point to runtimne. So searches are run again
and if needed pick new routine.

Note that there is a global table keeping info (including types)
about all exported routines from all modules. This table is used
when compileing a module and also by the search process at runtime.

The effect is that after recompilation of a single module I have
runnuble executable in memory including code of the new module.
If you wonder about compiling the same module many times: system
has garbage collector and unused code is garbage collected.
So, when old version is replaced by new one the old becomes a
garbage and will be collected in due time.

This sounds an intriguing kind of system to implement.

That is, where program source, code and data structures are kept
resident, individual functions and variables can be changed, and any
other functions that might be affected are recompiled, but no others.

This has some similarities to what I was doing in the 1990s with
hot-loadable and -modifible scripts. So a lot more dynamic than the
stuff I do now.

The problem is that my current applications are simply too small for it
to be worth the complexity. Most of them build 100% from scratch in
under 0.1 seconds, especially if working within a resident application
(my timings include Windows process start/end overheads.)

If I was routinely working with programs that were 10 times the scale
(so needing to wait 0.5 to 1 seconds), then it might be something I'd consider. Or I might just buy a faster machine; my current PC was pretty
much the cheapest in the shop in 2021.

The other system is similar in principle, but there is no need
for runtime search and domain vectors.

I might run my compiler hundreds of times a day (at 0.1 seconds a time,
600 builds would occupy one whole minute in the day!). I often do it for
frivolous purposes, such as trying to get some output lined up just
right. Or just to make sure something has been recompiled since it's so
quick it's hard to tell.

I know. But this is not what I do. Build produces mutiple
artifacts, some of them executable, some are loadable code (but _not_
in form recogized by operating system), some essentially non-executable
(like documentation).

So, 'build' means something different to you. I use 'build' just as a
change from writing 'compile'.

Build means creating new fully-functional system. That involves
possibly multiple compilations and whatever else is needed.

I would call that something else, perhaps based around 'Make' (nothing
to do with Linux 'make' tools).

Here is the result of such a process for one of my 1999 apps:

G:\m7>dir
10/03/1999 00:57 45,056 M7.DAT
17/10/2002 19:22 370,288 M7.EXE
11/10/2021 21:05 7,432 M7.INI
17/10/2002 19:27 705,376 M7.PCA
10/03/1999 00:59 8,541 M7CMD.INI

The PCA files contains a few dozen scripts (at that time, they were
compiled to bytecode). This was a distribution layout, created a batch
file, and ending up a floppy, or ater FTP-ed to a web-site.

This is not routine building of either than M7.EXE program unit, or
those scripts which are compiled independently from inside M7.EXE.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Fri Nov 22 00:00:04 2024

On 20/11/2024 21:17, Bart wrote:

On 20/11/2024 16:15, David Brown wrote:

On 20/11/2024 02:33, Bart wrote:

It's funny how nobody seems to care about the speed of compilers
(which can vary by 100:1), but for the generated programs, the 2:1
speedup you might get by optimising it is vital!

To understand this, you need to understand the benefits of a program
running quickly.

As I said, people are preoccupied with that for programs in general. But when it comes to compilers, it doesn't apply! Clearly, you are implying
that those benefits don't matter when the program is a compiler.

No - you are still stuck with your preconceived ideas, rather than ever bothering reading and thinking.

As I have said many times, people will always be happier if their
compiler runs faster - as long as that does not happen at the cost of
the functionality and features.

Thus I expect that whoever compiles the gcc binaries that I use
(occasionally that is myself, but like every other programmer I usually
use pre-built compilers), uses a good compiler with solid optimisation
enabled when building the compiler. And I expect that the gcc (and clang/llvm) developers put effort into making their tools fast - but
that they prioritise correctness first, then features, and only then
look at the speed of the tools and their memory usage. (And I don't
expect disk space to be of the remotest concern to them.)

� Let's look at the main ones:

<snip>

OK. I guess you missed the bits here and in another post, where I
suggested that enabling optimisation is fine for production builds.

I saw it. But maybe you missed the bit when the discussion was about
serious software developers. Waldek explained, and I've covered it
countless times in the past, but since you didn't pay attention then,
there is little point in repeating it now.

For the routines ones that I do 100s of times a day, where test runs are generally very short, then I don't want to hang about waiting for a
compiler that is taking 30 times longer than necessary for no good reason.

Your development process sounds bad in so many ways it is hard to know
where to start. I think perhaps the foundation is that you taught
yourself a bit of programming in the 1970's, convinced yourself at the
time that you were better at software development than anyone else, and
have been stuck in that mode and the same methodology for the last 50
years without ever considering that you could learn something new from
other people.

There is usually a point where a program is "fast enough" - going
faster makes no difference.� No one is ever going to care if a
compilation takes 1 second or 0.1 seconds, for example.

If you look at all the interactions people have with technology, with
GUI apps, even with mechanical things, a 1 second latency is generally disastrous.

A one-second delay between pressing a key and seeing a character appear
on a display or any other feedback, would drive most people up to wall.
But 0.1 is perfectly fine.

As I said, no one is ever going to care if a compilation takes 1 second
or 0.1 seconds.

It doesn't take much thought to realise that for most developers, the
speed of their compiler is not actually a major concern in comparison
to the speed of other programs.

Most developers are stuck with what there is. Naturally they will make
the best of it. Usually by finding 100 ways or 100 reasons to avoid
running the compiler.

So your advice is that developers should be stuck with what they have -
the imaginary compilers from your nightmares that take hours to run -
and that they should make a point of always running them as often as
possible? And presumably you also advise doing so on a bargain basement single-core computer from at least 15 years ago?

People who do software development seriously are like anyone else who
does something seriously - they want the best tools for the job, within budget. And if they are being paid for the task, their employer will
expect efficiency in return for the budget.

Which do you think an employer (or amateur programmer) would prefer?

a) A compiler that runs in 0.1 seconds with little static checking
b) A compiler that runs in 10 seconds but spots errors saving 6 hours debugging time

Developers don't want to waste time unnecessarily. Good build tools
means you get all the benefits of good compilers, without wasting time re-doing the same compilations when nothing has changed.

I can't understand why you think that's a bad thing - what is the point
of re-doing a build step when nothing has changed? And a build tool
file is also the place to hold the details of how to do the build -
compiler versions, flags, list of sources, varieties of output files, additional pre- or post-processing actions, and so on. I couldn't
imagine working with anything beyond a "hello, world" without a build tool.

While writing code, and testing and debugging it, a given build might
only be run a few times, and compile speed is a bit more relevant.
Generally, however, most programs are run far more often, and for far
longer, than their compilation time.

Developing code is the critical bit.

Yes.

I might spend an hour or two writing code (including planing,
organising, reading references, etc.) and then 5 seconds building it.
Then there might be anything from a few minutes to a few hours testing
or debugging. How could that process be improved by a faster compile?
Even for the most intense code-compile-debug cycles, building rarely
takes longer than stretching my fingers or taking a mouthful of coffee.

But using a good compiler saves a substantial amount of developer time
because I can write better code with a better structure, I can rely on
the optimisation it does (instead of "hand-optimising" code to get the efficiency I need), and good static checking and good diagnostic
messages help me fix mistakes before test and debug cycles.

Even when a test run takes a bit longer as you need to set things up,
when you do need to change something and run it again, you don't want
any pointless delay.

Neither do you want to waste /your/ time pandering to a compiler's
slowness by writing makefiles and defining dependencies.

That is not what "make" is for. Speed is a convenient by-product of
good project management and build tools.

Or even
splitting things up into tiny modules.

Speed is not the reason people write modular, structured code.

I don't want to care about that
at all. Here's my bunch of source files, just build the damn thing, and
do it now!

You apparently don't want to care about anything much.

<snip the rest to save time>

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Fri Nov 22 02:20:22 2024

On 21/11/2024 13:00, David Brown wrote:

On 20/11/2024 21:17, Bart wrote:

For the routines ones that I do 100s of times a day, where test runs
are generally very short, then I don't want to hang about waiting for
a compiler that is taking 30 times longer than necessary for no good
reason.

Your development process sounds bad in so many ways it is hard to know
where to start.� I think perhaps the foundation is that you taught
yourself a bit of programming in the 1970's,

1970s builds, especially on mainframes, were dominated by link times.
You also had to keep on eye on resources (eg. allocated CPU time), as
they were limited on time-shared systems.

Above all, you could only do active work from a terminal that you first
had to book, for one-hour slots.

I'm surprised you think that my tools and working practices have any connection with the above.

I've also eliminated linkers; you apparently still use them.

As I said, no one is ever going to care if a compilation takes 1 second
or 0.1 seconds.

And yet, considerable effort IS placed on getting development tools to
run fast:

* Presumably, optimisation is applied to a compiler to get it faster
than otherwise. But why bother if the difference is only a second or so?

* Tools can now do builds in parallel across multiple cores. Again, why?
So that 1 second becomes 20 lots of 50ms? Or what that 1 second really
have been 20 seconds without that feature?

* People are developing new kinds of linkers (I think there was 'gold',
and now something else) which are touted as being several times faster
than traditional

* All sorts of make and other files are used to define dependency graphs between program modules. Why? Presumably to minimise time spent recompiling.

* There various JIT compilation schemes where a rough version of an application can get up and running quickly, with 'hot' functions
compiled and optimised on demand. Again, why?

If people really don't care about compilation speed, why this vast effort?

Getting development tools faster is an active field, and everyone
benefits including you, but when I do it, it's a pointless waste of time?

As I said, no one is ever going to care if a compilation takes 1 second
or 0.1 seconds.

Have you asked? You must use interactive tools like shells; I guess you wouldn't want a pointless one second delay after each command, which you
KNOW doesn't warrant such a delay.

That would surely slow you down if used to fluently firing off a rapid sequence of commands.

The problem is that you don't view use of a compiler as just another interactive command.

As I said, no one is ever going to care if a compilation takes 1 second
or 0.1 seconds.

Here's an actual use-case: I have a transpiler that produces a
single-file C output of 40K lines. Tiny C can build it in 0.2 seconds.
gcc -O0 takes 2.2 seconds. However there's no point in using gcc, as the generated code is as poor as Tiny C, so I might as well use that.

But if I want faster code, gcc -O2 takes 11 seconds.

For lots of routine builds used for testing, passing the intermediate C through gcc -O2 makes no sense at all. It is just a waste of time,
destroys my train of thought, and is very frustrating.

However, if you ran the world, then tools like gcc and its ilk would be
the only choice!

So your advice is that developers should be stuck

I'm saying that most developers don't write their own tools. They will
use off-the-shelf language implementations. If those happen to be slow,
then there's little they can do except work within those limitations.

Or just twiddle their thumbs.

Which do you think an employer (or amateur programmer) would prefer?

a) A compiler that runs in 0.1 seconds with little static checking
b) A compiler that runs in 10 seconds but spots errors saving 6 hours debugging time

You can have both. You can run a slow compiler that might pick up those errors.

But sometimes you make a trivial mod (eg. change a prompt); do you
REALLY need that deep analysis all over again? Do you still it fully optimised?

If your answer is YES to both then there's little point in further
discussion.

I might spend an hour or two writing code (including planing,
organising, reading references, etc.) and then 5 seconds building it.
Then there might be anything from a few minutes to a few hours testing
or debugging.

Up to a few hours testing and debugging without need to rebuild? That
last time I had to do that, it was a program written on punched cards
that was submitted as an overnight job. You could compile it only once a
day.

And you're accusing ME of being stuck in the 70s!

But using a good compiler saves a substantial amount of developer time

A better language too.

<snip the rest to save time>

So you snipped my comments about fast bytecode compilers which do zero analysis being perfectly acceptable for scripting languages.

And my remark about my language edging towards behaving as a scripting language.

I can see why you wouldn't want to respond to that.

BTW I'm doing the same with C; given this program:

int main(void) {
int a;
int* p = 0;
a = *p;
}

Here's what happens with my C compiler when told to interpret it:

c:\cx>cc -i c
Compiling c.c to c.(int)
Error: Null ptr access

Here's what happens with gcc:

c:\cx>gcc c.c
c:\cx>a
<crashes>

Is there some option to insert such a check with gcc? I've no idea; most people don't.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Scott Lurndal@3:633/280.2 to All on Fri Nov 22 02:50:54 2024

Reply-To: slp53@pacbell.net

Bart <bc@freeuk.com> writes:

On 21/11/2024 13:00, David Brown wrote:

On 20/11/2024 21:17, Bart wrote:

For the routines ones that I do 100s of times a day, where test runs
are generally very short, then I don't want to hang about waiting for
a compiler that is taking 30 times longer than necessary for no good
reason.

Your development process sounds bad in so many ways it is hard to know
where to start. I think perhaps the foundation is that you taught
yourself a bit of programming in the 1970's,

1970s builds, especially on mainframes, were dominated by link times.

Which mainframe do you have experience on?

I spent a decade writing a mainframe operating system (the largest
application we had to compile regularly) and the link time was a
minor fraction of the overall build time.

It was so minor that our build system stored the object files
so that the OS engineers only needed to recompile the object
associated with the source file being modified rather than
the entire OS, they'd share the rest of the object files
with the entire OS team.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Fri Nov 22 03:05:58 2024

On 21/11/2024 15:50, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 21/11/2024 13:00, David Brown wrote:

On 20/11/2024 21:17, Bart wrote:

For the routines ones that I do 100s of times a day, where test runs
are generally very short, then I don't want to hang about waiting for
a compiler that is taking 30 times longer than necessary for no good
reason.

Your development process sounds bad in so many ways it is hard to know
where to start.� I think perhaps the foundation is that you taught
yourself a bit of programming in the 1970's,

1970s builds, especially on mainframes, were dominated by link times.

Which mainframe do you have experience on?

I spent a decade writing a mainframe operating system (the largest application we had to compile regularly) and the link time was a
minor fraction of the overall build time.

It was so minor that our build system stored the object files
so that the OS engineers only needed to recompile the object
associated with the source file being modified rather than
the entire OS, they'd share the rest of the object files
with the entire OS team.

The one I remember most was 'TKB' I think it was, running on ICL 4/72
(360 clone). It took up most of the memory. It was used to link my small Fortran programs.

But linking always seems to have been big deal in that era, until I had
to write one for microcomputers, then it was a simple case of loading N
object files and combining them into one COM file. It was as fast as
they could be loaded off a floppy.

(Given that the largest COM might have been a few 10s of KB, and floppy transfer time was some 20KB/s once a sector was located, it wouldn't
have been long.)

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Scott Lurndal@3:633/280.2 to All on Fri Nov 22 03:10:38 2024

Reply-To: slp53@pacbell.net

Bart <bc@freeuk.com> writes:

On 21/11/2024 15:50, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 21/11/2024 13:00, David Brown wrote:

On 20/11/2024 21:17, Bart wrote:

For the routines ones that I do 100s of times a day, where test runs >>>>> are generally very short, then I don't want to hang about waiting for >>>>> a compiler that is taking 30 times longer than necessary for no good >>>>> reason.

Your development process sounds bad in so many ways it is hard to know >>>> where to start. I think perhaps the foundation is that you taught
yourself a bit of programming in the 1970's,

1970s builds, especially on mainframes, were dominated by link times.

Which mainframe do you have experience on?

I spent a decade writing a mainframe operating system (the largest
application we had to compile regularly) and the link time was a
minor fraction of the overall build time.

It was so minor that our build system stored the object files
so that the OS engineers only needed to recompile the object
associated with the source file being modified rather than
the entire OS, they'd share the rest of the object files
with the entire OS team.

The one I remember most was 'TKB' I think it was, running on ICL 4/72
(360 clone). It took up most of the memory. It was used to link my small >Fortran programs.

So you generalize from your one non-standard experience to the entire ecosystem.

Typical Bart.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Fri Nov 22 04:22:31 2024

On 21/11/2024 16:10, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 21/11/2024 15:50, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 21/11/2024 13:00, David Brown wrote:

On 20/11/2024 21:17, Bart wrote:

For the routines ones that I do 100s of times a day, where test runs >>>>>> are generally very short, then I don't want to hang about waiting for >>>>>> a compiler that is taking 30 times longer than necessary for no good >>>>>> reason.

Your development process sounds bad in so many ways it is hard to know >>>>> where to start.� I think perhaps the foundation is that you taught
yourself a bit of programming in the 1970's,

1970s builds, especially on mainframes, were dominated by link times.

Which mainframe do you have experience on?

I spent a decade writing a mainframe operating system (the largest
application we had to compile regularly) and the link time was a
minor fraction of the overall build time.

It was so minor that our build system stored the object files
so that the OS engineers only needed to recompile the object
associated with the source file being modified rather than
the entire OS, they'd share the rest of the object files
with the entire OS team.

The one I remember most was 'TKB' I think it was, running on ICL 4/72
(360 clone). It took up most of the memory. It was used to link my small
Fortran programs.

So you generalize from your one non-standard experience to the entire ecosystem.

Typical Bart.

Typical Scott. Did you post just to do a bit of bart-bashing?

Have you also considered that your experience of building operating
systems might itself be non-standard?

People quite likely used those machines to develop other applications
than OSes. Then the dynamics could have been different.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Scott Lurndal@3:633/280.2 to All on Fri Nov 22 04:55:01 2024

Reply-To: slp53@pacbell.net

Bart <bc@freeuk.com> writes:

On 21/11/2024 16:10, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 21/11/2024 15:50, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 21/11/2024 13:00, David Brown wrote:

On 20/11/2024 21:17, Bart wrote:

For the routines ones that I do 100s of times a day, where test runs >>>>>>> are generally very short, then I don't want to hang about waiting for >>>>>>> a compiler that is taking 30 times longer than necessary for no good >>>>>>> reason.

Your development process sounds bad in so many ways it is hard to know >>>>>> where to start. I think perhaps the foundation is that you taught >>>>>> yourself a bit of programming in the 1970's,

1970s builds, especially on mainframes, were dominated by link times. >>>>

Which mainframe do you have experience on?

I spent a decade writing a mainframe operating system (the largest
application we had to compile regularly) and the link time was a
minor fraction of the overall build time.

It was so minor that our build system stored the object files
so that the OS engineers only needed to recompile the object
associated with the source file being modified rather than
the entire OS, they'd share the rest of the object files
with the entire OS team.

The one I remember most was 'TKB' I think it was, running on ICL 4/72
(360 clone). It took up most of the memory. It was used to link my small >>> Fortran programs.

So you generalize from your one non-standard experience to the entire ecosystem.

Typical Bart.

Typical Scott. Did you post just to do a bit of bart-bashing?

Have you also considered that your experience of building operating
systems might itself be non-standard?

We had a few thousand customers building code using the same
compilers and, when needed, linkers.

The vast majority used COBOL, which seldom required an
explicit link step.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Fri Nov 22 12:09:03 2024

On 10/11/2024 06:00, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

...or to just always require 'else', with a dummy value if necessary?

Well, frequently it is easier to do bad job, than a good one.

I assume that you consider the simple solution the 'bad' one?

You wrote about _always_ requiring 'else' regardless if it is
needed or not. Yes, I consider this bad.

I tried the earlier C example in Rust:

fn fred(n:i32)->i32 {
if n==1 {return 10;}
if n==2 {return 20;}
}

I get this error:

Error(s):
error[E0317]: if may be missing an else clause
--> 1022687238/source.rs:5:5
|
3 | fn fred(n:i32)->i32 {
| --- expected `i32` because of this return type
4 | if n==1 {return 10;}
5 | if n==2 {return 20;}
| ^^^^^^^^^^^^^^^^^^^^ expected i32, found ()
|
= note: expected type `i32`
found type `()`
= note: `if` expressions without `else` evaluate to `()`
= help: consider adding an `else` block that evaluates to the
expected type

error: aborting due to previous error

So Rust here is behaving exactly the same as my language (mine just says
'else needed').

Rust is generally a well-regarded and well-designed language. It also
has clear and helpful error messages.

Presumably you would regard this as 'bad' too.

In this case the behaviour is not the easy solution, as Rust compilers
are even slower and more complex than big C compilers. It is just a
language choice.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Fri Nov 22 22:05:02 2024

On 21/11/2024 13:00, David Brown wrote:

On 20/11/2024 21:17, Bart wrote:

Your development process sounds bad in so many ways it is hard to know
where to start.� I think perhaps the foundation is that you taught
yourself a bit of programming in the 1970's,

I did a CS degree actually. I also spent a year programming, working for
the ARC and SRC (UK research councils).

But since you are being so condescending, I think /your/ problem is in
having to use C. I briefly mentioned that a 'better language' can help.

While I don't claim that my language is particularly safe, mine is
somewhat safer than C in its type system, and far less error prone in
its syntax and its overall design (for example, a function's details are always defined in exactly one place, so less maintenance and fewer
things to get wrong).

So, half the options in your C compilers are to help get around those shortcomings.

You also seem proud that in this example:

int F(int n) {
if (n==1) return 10;
if (n==2) return 20;
}

You can use 'unreachable()', a new C feature, to silence compiler
messages about running into the end of the function, something I
considered a complete hack.

My language requires a valid return value from the last statement. In
that it's similar to the Rust example I posted 9 hours ago.

Yet the gaslighting here suggested what I chose to do was completely wrong.

And presumably you also advise doing so on a bargain basement
single-core computer from at least 15 years ago?

Another example of you acknowledging that compilation speed can be a
problem. So a brute force approach to speed is what counts for you.

If you found that it took several hours to drive 20 miles from A to B,
your answer would be to buy a car that goes at 300mph, rather than doing endless detours along the way.

Or another option is to think about each journey extremely carefully,
and then only do the trip once a week!

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Fri Nov 22 23:33:29 2024

Bart <bc@freeuk.com> wrote:

Sure. That's when you run a production build. I can even do that myself
on some programs (the ones where my C transpiler still works) and pass
it through gcc-O3. Then it might run 30% faster.

On fast machine running Dhrystone 2.2a I get:

tcc-0.9.28rc 20000000
gcc-12.2 -O 64184852
gcc-12.2 -O2 83194672
clang-14 -O 83194672
clang-14 -O2 85763288

so with 02 this is more than 4 times faster. Dhrystone correlated
resonably with runtime of tight compute-intensive programs.
Compiler started to cheat on original Dhrystone, so there are
bigger benchmarks like SPEC INT. But Dhrystone 2 has modifications
to make cheating harder, so I think it is still reasonable
benchmark. Actually, difference may be much bigger, for example
in image processing both clang and gcc can use vector intructions,
with may give additional speedup of order 8-16.

30% above means that you are much better than tcc or your program
is badly behaving (I have programs that make intensive use of
memory, here effect of optimization would be smaller, but still
of order 2).

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Fri Nov 22 23:51:27 2024

Bart <bc@freeuk.com> wrote:

int main(void) {
int a;
int* p = 0;
a = *p;
}

Here's what happens with my C compiler when told to interpret it:

c:\cx>cc -i c
Compiling c.c to c.(int)
Error: Null ptr access

Here's what happens with gcc:

c:\cx>gcc c.c
c:\cx>a
<crashes>

Is there some option to insert such a check with gcc? I've no idea; most people don't.

I would do

gcc -g c.c
gdb a.out
run

and gdb would show me place with bad access. Things like bound
checking array access or overflow checking makes a big difference.
Null pointer access is reliably detected by hardware so no big
deal. Say what you 'cc' will do with the following function:

int
foo(int n) {
int a[10];
int i;
int res = 0;
for(i = 0; i <= 10; i++) {
a[i] = n + i;
}
for(i = 0; i <= 10; i++) {
res += a[i];
}
res;
}

Here gcc at compile time says:

foo.c: In function ‘foo’:
foo.c:15:17: warning: iteration 10 invokes undefined behavior [-Waggressive-loop-optimizations]
15 | res += a[i];
| ~^~~
foo.c:14:18: note: within this loop
14 | for(i = 0; i <= 10; i++) {
| ~~^~~~~

Of course, there are also cases like

void
bar(int n, int a[n]) {
int i;
for(i = 0; i <= n; i++) {
a[i] = i;
}
}

which are really wrong, but IIUC C standard considers them OK.
Still, good compiler should have an option to flag them either
at compile time or at runtime.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Sat Nov 23 01:11:51 2024

On 22/11/2024 12:51, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

int main(void) {
int a;
int* p = 0;
a = *p;
}

Here's what happens with my C compiler when told to interpret it:

c:\cx>cc -i c
Compiling c.c to c.(int)
Error: Null ptr access

Here's what happens with gcc:

c:\cx>gcc c.c
c:\cx>a
<crashes>

Is there some option to insert such a check with gcc? I've no idea; most
people don't.

I would do

gcc -g c.c
gdb a.out
run

and gdb would show me place with bad access. Things like bound
checking array access or overflow checking makes a big difference.
Null pointer access is reliably detected by hardware so no big
deal. Say what you 'cc' will do with the following function:

int
foo(int n) {
int a[10];
int i;
int res = 0;
for(i = 0; i <= 10; i++) {
a[i] = n + i;
}
for(i = 0; i <= 10; i++) {
res += a[i];
}
res;
}

Here gcc at compile time says:

foo.c: In function ‘foo’:
foo.c:15:17: warning: iteration 10 invokes undefined behavior [-Waggressive-loop-optimizations]
15 | res += a[i];
| ~^~~
foo.c:14:18: note: within this loop
14 | for(i = 0; i <= 10; i++) {
| ~~^~~~~

My 'cc -i' wouldn't detect it. The -i tells it to run an interpreter on
the intermediate code. Within the interpreter, some things are easily
checked, but bounds info on arrays doesn't exist. (The IL supports only pointer operations, not high level array ops.)

That would need intervention at an earlier stage, but even then, the
design of C makes that difficult. First, because array types like
int[10] decay to simple pointers, and ones represented by types like
int* don't have bounds info at all. (I don't support int[n] params and
few people use them anyway.)

In my static language, it would be a little easier because an int[10]
type doesn't decay; the info persists. C's int* would be ref[]int, still unbounded so has the same problem.

However the language also allows slices, array pointers that include a
length, so those can be used for bounds checking. But then, it's not
really needed in that case, since you tend to write loops like this:

func foo(slice[]int a)int =
for x in a do # iterate over values
....
for i in a.bounds do # iterate over bounds
....

Apart from that, I have a higher level, interpreted language does do
full bounds checking, so algorithms can be tested with that then ported
to the static language, a task made simpler by them using the same
syntax. I just need to add type annotations.

Getting back to 'cc -i', if I apply it to the program here, it gives an
error:

c:\cx>type c.c
#include <stdio.h>

int fred() {}

int main(void) {
printf("%d\n", fred());
}

c:\cx>cc -i c
Compiling c.c to c.(int)
PCL Exec error: RETF/SP mismatch: old=2 curr=1 seq: 7

If I try it with gcc, then nothing much happens:

c:\cx>gcc c.c
c:\cx>a
1

If optimised, it shows 0 instead of 1, both meaningless values. It's a
good thing the function wasn't called 'launchmissile()'.

Trying it with my language:

c:\mx>type t.m
func fred:int =
end

proc main =
println fred()
end

c:\mx>mm -i t
Compiling t.m to t.(int)
TX Type Error:
....
Void expression/return value missing

It won't compile it, and without needing to figure out which obscure set
of options is needed to give a hard error.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Michael S@3:633/280.2 to All on Sat Nov 23 01:19:05 2024

On Fri, 22 Nov 2024 12:33:29 -0000 (UTC)
antispam@fricas.org (Waldek Hebisch) wrote:

Bart <bc@freeuk.com> wrote:

Sure. That's when you run a production build. I can even do that
myself on some programs (the ones where my C transpiler still
works) and pass it through gcc-O3. Then it might run 30% faster.

On fast machine running Dhrystone 2.2a I get:

tcc-0.9.28rc 20000000
gcc-12.2 -O 64184852
gcc-12.2 -O2 83194672
clang-14 -O 83194672
clang-14 -O2 85763288

so with 02 this is more than 4 times faster. Dhrystone correlated
resonably with runtime of tight compute-intensive programs.
Compiler started to cheat on original Dhrystone, so there are
bigger benchmarks like SPEC INT. But Dhrystone 2 has modifications
to make cheating harder, so I think it is still reasonable
benchmark. Actually, difference may be much bigger, for example
in image processing both clang and gcc can use vector intructions,
with may give additional speedup of order 8-16.

30% above means that you are much better than tcc or your program
is badly behaving (I have programs that make intensive use of
memory, here effect of optimization would be smaller, but still
of order 2).

gcc -O is not what Bart was talking about. It is quite similar to -O1.
Try gcc -O0.
With regard to speedup, I had run only one or two benchmarks with tcc
and my results were close to those of Bart. gcc -O0 very similar to tcc
in speed of the exe, but compiles several times slower. gcc -O2 exe
about 2.5 times faster.
I'd guess, I can construct a case, where gcc successfully vectorized
some floating-point loop calculation and showed 10x speed up vs tcc on
modern Zen5 hardware. But that's would not be typical.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Sat Nov 23 02:00:51 2024

On 22/11/2024 12:33, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

Sure. That's when you run a production build. I can even do that myself
on some programs (the ones where my C transpiler still works) and pass
it through gcc-O3. Then it might run 30% faster.

On fast machine running Dhrystone 2.2a I get:

tcc-0.9.28rc 20000000
gcc-12.2 -O 64184852
gcc-12.2 -O2 83194672
clang-14 -O 83194672
clang-14 -O2 85763288

so with 02 this is more than 4 times faster. Dhrystone correlated
resonably with runtime of tight compute-intensive programs.
Compiler started to cheat on original Dhrystone, so there are
bigger benchmarks like SPEC INT. But Dhrystone 2 has modifications
to make cheating harder, so I think it is still reasonable
benchmark. Actually, difference may be much bigger, for example
in image processing both clang and gcc can use vector intructions,
with may give additional speedup of order 8-16.

30% above means that you are much better than tcc or your program
is badly behaving (I have programs that make intensive use of
memory, here effect of optimization would be smaller, but still
of order 2).

The 30% applies to my typical programs, not benchmarks. Sure, gcc -O3
can do a lot of aggressive optimisations when everything is contained
within one short module and most runtime is spent in clear bottlenecks.

Real apps, like say my compilers, are different. They tend to use
globals more, program flow is more disseminated. The bottlenecks are
harder to pin down.

But, OK, here's the first sizeable benchmark that I thought of (I can't
find a reliable Dhrystone one; perhaps you can post a link).

It's called Deltablue.c, copied to db.c below for convenience. I've no
idea what it does, but the last figure shown is the runtime, so smaller
is better:

c:\cx>cc -r db
Compiling db.c to db.(run)
DeltaBlue C <S:> 1000x 0.517ms

c:\cx>tcc -run db.c
DeltaBlue C <S:> 1000x 0.546ms

c:\cx>gcc db.c && a
DeltaBlue C <S:> 1000x 0.502ms

c:\cx>gcc -O3 db.c && a
DeltaBlue C <S:> 1000x 0.314ms

So here gcc is 64% faster than my product. However my 'cc' doesn't yet
have the register allocator of the older 'mcc' compiler (which simply
keeps some locals in registers). That gives this result:

c:\cx>mcc -o3 db && db
Compiling db.c to db.exe
DeltaBlue C <S:> 1000x 0.439ms

So, 40% faster, for a benchmark.

Now, for a more practical test. First I will create an optimised version
of my compiler via transpiling to C:

c:\mx6>mc -opt mm -out:mmgcc
M6 Compiling mm.m---------- to mmgcc.exe
W:Invoking C compiler: gcc -m64 -O3 -ommgcc.exe mmgcc.c -s

Now I run my normal compiler, self-hosted, on a test program 'fann4.m':

c:\mx6>tm mm \mx\big\fann4 -ext
Compiling \mx\big\fann4.m to \mx\big\fann4.exe
TM: 0.99

Now the gcc-optimised version:

c:\mx6>tm mmgcc \mx\big\fann4 -ext
Compiling \mx\big\fann4.m to \mx\big\fann4.exe
TM: 0.78

So it's 27% faster. Note that fann4.m is 740Kloc, so this represents compilation speed of just under a million lines per second.

Some other stats:

c:\mx6>dir mm.exe mmgcc.exe
22/11/2024 14:43 393,216 mm.exe
22/11/2024 14:37 651,776 mmgcc.exe

So my product has a smaller EXE too. For more typical inputs, the
differences are narrower:

c:\mx6>copy mm.m bb.m

c:\mx6>tm mm bb
Compiling bb.m to bb.exe
TM: 0.09

c:\mx6>tm mmgcc bb -ext
Compiling bb.m to bb.exe
TM: 0.08

gcc-O3 is 12% faster, saving 10ms in compile-time. Curious about how tcc
would fare? Let's try it:

c:\mx6>mc -tcc mm -out:mmtcc
M6 Compiling mm.m---------- to mmtcc.exe
W:Invoking C compiler: tcc -ommtcc.exe mmtcc.c c:\windows\system32\user32.dll -luser32 c:\windows\system32\kernel32.dll -fdollars-in-identifiers

c:\mx6>tm mmtcc bb
Compiling bb.m to bb.exe
TM: 0.11

Yeah, a tcc-compiled M compiler would take 0.03 seconds longer to build
my 35Kloc compiler than a gcc-O3-compiled one; about 37% slower.

One more point: when gcc builds my compiler, it can use whole-program optimisation because the input is one source file. So that gives it an
extra edge compared with compiling individual modules.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Sat Nov 23 02:21:55 2024

On 22/11/2024 15:19, Michael S wrote:

On Fri, 22 Nov 2024 12:33:29 -0000 (UTC)
antispam@fricas.org (Waldek Hebisch) wrote:

Bart <bc@freeuk.com> wrote:

Sure. That's when you run a production build. I can even do that
myself on some programs (the ones where my C transpiler still
works) and pass it through gcc-O3. Then it might run 30% faster.

On fast machine running Dhrystone 2.2a I get:

tcc-0.9.28rc 20000000
gcc-12.2 -O 64184852
gcc-12.2 -O2 83194672
clang-14 -O 83194672
clang-14 -O2 85763288

so with 02 this is more than 4 times faster. Dhrystone correlated
resonably with runtime of tight compute-intensive programs.
Compiler started to cheat on original Dhrystone, so there are
bigger benchmarks like SPEC INT. But Dhrystone 2 has modifications
to make cheating harder, so I think it is still reasonable
benchmark. Actually, difference may be much bigger, for example
in image processing both clang and gcc can use vector intructions,
with may give additional speedup of order 8-16.

30% above means that you are much better than tcc or your program
is badly behaving (I have programs that make intensive use of
memory, here effect of optimization would be smaller, but still
of order 2).

gcc -O is not what Bart was talking about. It is quite similar to -O1.

"Similar" in this particular case being a synonym for "identical" :-)

Try gcc -O0.
With regard to speedup, I had run only one or two benchmarks with tcc
and my results were close to those of Bart. gcc -O0 very similar to tcc
in speed of the exe, but compiles several times slower. gcc -O2 exe
about 2.5 times faster.

(Note that "gcc -O0" is still a vastly more powerful compiler than tcc
in many ways.)

I'd guess, I can construct a case, where gcc successfully vectorized
some floating-point loop calculation and showed 10x speed up vs tcc on
modern Zen5 hardware. But that's would not be typical.

The effect you get from optimisation depends very much on the code in question, the exact compiler flags, and also on the processor you are using.

Fairly obviously, if your code spends a lot of time in system calls,
waiting for external events (files, networks, etc.), or calling code in
other separately compiled libraries, then optimisation of your code will
make almost no difference. Something that does a lot of calculations
and data manipulation, on the other hand, can be much faster. Even
then, however, it depends on what you are doing.

Beyond simple "-O3" flags, things like "-march=native" and "-ffast-math"
(if you have floating point calculations, and you are sure this does not affect the correctness of the code!) can make a huge difference by
allowing more re-arrangements, vector/SIMD processing, using more
instructions on newer processors, and having a more accurate model of scheduling.

And the type of processor is also very important. x86 processors are
tuned to running crappy code, since a lot of the time they are used to
run old binaries made by old tools, or binaries made by people who don't
know how to use their tools well. So they have features like extremely
local data caches to hide the cost of using the stack for local
variables instead of registers. And often it doesn't matter if you do
one instruction or a dozen instructions, because you are waiting for
memory anyway. If you are looking at microcontrollers, on the other
hand, optimisation can make a huge difference for a lot of real-world code.

There is also another substantial difference in code efficiency that is
missed out in these sorts of pretend benchmarks. When efficiency really matters, top-shelf compilers give you features and extensions to help.
You can use intrinsics, or vector extensions, or pragmas, or attributes,
or "builtins", to give the compiler more information and work with it to
give more opportunities for optimisation. Many of these are not
portable (or of limited portability), and getting top speed from your
code is not an easy job, but you certainly have possibilities with a
tool like gcc or clang that you can never have with tcc or other tiny compilers.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Sat Nov 23 03:06:19 2024

On 22/11/2024 12:05, Bart wrote:

On 21/11/2024 13:00, David Brown wrote:

On 20/11/2024 21:17, Bart wrote:

Your development process sounds bad in so many ways it is hard to know
where to start.� I think perhaps the foundation is that you taught
yourself a bit of programming in the 1970's,

I did a CS degree actually. I also spent a year programming, working for
the ARC and SRC (UK research councils).

But since you are being so condescending, I think /your/ problem is in having to use C. I briefly mentioned that a 'better language' can help.

I use better languages than C, when there are better languages than C
for the task. And as you regularly point out, I don't program in
"normal" C, but in a subset of C limited by (amongst many other things)
a choice of gcc warnings, combined with compiler extensions.

My programming and thinking is not limited to C. But I believe I have a better general understanding of that language than you do (though there
are some aspects you no doubt know better than me). I can say that
because I have read the standards, and make a point of keeping up with
them. I think about the features of C - I don't simply reject half of
them because of some weird prejudice (and then complain that the
language doesn't have the features you want!). I learn what the
language actually says and how it is defined - I don't alternate between pretending it is all terrible, and pretending it works the way I'd like
it to work.

While I don't claim that my language is particularly safe, mine is
somewhat safer than C in its type system, and far less error prone in
its syntax and its overall design (for example, a function's details are always defined in exactly one place, so less maintenance and fewer
things to get wrong).

So, half the options in your C compilers are to help get around those shortcomings.

What is your point? Are you trying to say that your language is better
than C because your language doesn't let you make certain mistakes that
a few people sometimes make in C? So what? Your language doesn't let
people make mistakes because no one else uses it. If they did, I am
confident that it would provide plenty of scope for getting things wrong.

People can write good quality C with few mistakes. They have the tools available to help them. If they don't make use of the tools, it's their
fault - not the fault of the language. If they write bad code - as bad programmers do in any language, with any tools - it's their fault.

You also seem proud that in this example:

� int F(int n) {
�� if (n==1) return 10;
�� if (n==2) return 20;
� }

You can use 'unreachable()', a new C feature, to silence compiler
messages about running into the end of the function, something I
considered a complete hack.

I don't care what you consider a hack. I appreciate being able to write
code that is safe, correct, clear, maintainable and efficient. I don't
really understand why that bothers you. Do you find it difficult to
write such code in C?

My language requires a valid return value from the last statement. In
that it's similar to the Rust example I posted 9 hours ago.

If you are not able to use a feature such as "unreachable()" safely and correctly, then I suppose it makes sense not to have such a feature in
your language.

Personally, I have use of powerful tools. And I like that those
powerful tools also come with checks and safety features.

Of course there is a place for different balances between power and
safety here - there is a reason there are many programming languages,
and why many programmers use different languages for different tasks. I
would not expect many C programmers to have much use for "unreachable()".

Yet the gaslighting here suggested what I chose to do was completely wrong.

And presumably you also advise doing so on a bargain basement
single-core computer from at least 15 years ago?

Another example of you acknowledging that compilation speed can be a problem. So a brute force approach to speed is what counts for you.

No, trying to use a long-outdated and underpowered computer and then complaining about the speed is a problem.

But if I felt that compiler speed was a serious hinder to my work, and alternatives did not do as good a job, I'd get a faster computer (within reason). That's the way things work for professionals. (If I felt that expensive commercial compilers did a better job than gcc for my work,
then I'd buy them - I've tested them and concluded that gcc is the best
tool for my needs, regardless of price.)

If you found that it took several hours to drive 20 miles from A to B,
your answer would be to buy a car that goes at 300mph, rather than doing endless detours along the way.

Presumably, in your analogy, the detours are useful.

Or another option is to think about each journey extremely carefully,
and then only do the trip once a week!

That sounds a vastly better option, yes.

Certainly it is better than swapping out the car with an electric
scooter that can't do these important "detours".

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Kaz Kylheku@3:633/280.2 to All on Sat Nov 23 05:10:50 2024

On 2024-11-22, Bart <bc@freeuk.com> wrote:

You also seem proud that in this example:

int F(int n) {
if (n==1) return 10;
if (n==2) return 20;
}

You can use 'unreachable()', a new C feature, to silence compiler
messages about running into the end of the function, something I
considered a complete hack.

Unreachable assertions are actually a bad trade if all you are looking
for is to suppress a diagnostic. Because the behavior is undefined
if the unreachable is actually reached.

That's literally the semantic definition! "unreachable()" means,
roughly, "remove all definition of behavior from this spot in the
program".

Whereas falling off the end of an int-returning function only
becomes undefined if the caller obtains the return value,
and of course in the case of a void function, it's well-defined.

You are better off with:

assert(0 && "should not be reached");
return 0;

if asserts are turned off with NDEBUG, the function does something that
is locally safe, and offers the possibility of avoiding a disaster.

The only valid reason for using unreachable is optimization: you're
introducing something unsafe in order to get better machine code. When
the compiler is informed that the behavior is always undefined when some
code is reached, it can just delete that code and everything dominated
by it (reachable only through it).

The above function does not need a function return sequence to be
emitted for the fall-through case that is not expected to occur,
if the situation truly does not occur. Then if it does occur, hell
will break loose since control will fall through to whatever bytes
follow the abrupt end of the function.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Sat Nov 23 06:29:59 2024

Bart <bc@freeuk.com> wrote:

On 22/11/2024 12:33, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

Sure. That's when you run a production build. I can even do that myself
on some programs (the ones where my C transpiler still works) and pass
it through gcc-O3. Then it might run 30% faster.

On fast machine running Dhrystone 2.2a I get:

tcc-0.9.28rc 20000000
gcc-12.2 -O 64184852
gcc-12.2 -O2 83194672
clang-14 -O 83194672
clang-14 -O2 85763288

so with 02 this is more than 4 times faster. Dhrystone correlated
resonably with runtime of tight compute-intensive programs.
Compiler started to cheat on original Dhrystone, so there are
bigger benchmarks like SPEC INT. But Dhrystone 2 has modifications
to make cheating harder, so I think it is still reasonable
benchmark. Actually, difference may be much bigger, for example
in image processing both clang and gcc can use vector intructions,
with may give additional speedup of order 8-16.

30% above means that you are much better than tcc or your program
is badly behaving (I have programs that make intensive use of
memory, here effect of optimization would be smaller, but still
of order 2).

The 30% applies to my typical programs, not benchmarks. Sure, gcc -O3
can do a lot of aggressive optimisations when everything is contained
within one short module and most runtime is spent in clear bottlenecks.

Real apps, like say my compilers, are different. They tend to use
globals more, program flow is more disseminated. The bottlenecks are
harder to pin down.

But, OK, here's the first sizeable benchmark that I thought of (I can't
find a reliable Dhrystone one; perhaps you can post a link).

First Google hit for Dhrystone 2.2a

https://homepages.cwi.nl/~steven/dry.chttps://homepages.cwi.nl/~steven/dry.c

(I used this one).

Compiled in two steps like:

gcc -c -O -o dry.o dry.c
gcc -o dry2 -DPASS2 -O dry.c dry.o

If you want something practical, I have the following C function:

#include <stdint.h>
void inner_mul(uint32_t * x, uint32_t * y, uint32_t * z,
uint32_t xdeg, uint32_t ydeg, uint32_t zdeg, uint32_t p) {
if (ydeg < xdeg) {
uint32_t * tmpp = x;
uint32_t tmp = xdeg;
x = y;
xdeg = ydeg;
y = tmpp;
ydeg = tmp;
}
if (zdeg < xdeg) {
xdeg = zdeg;
}
if (zdeg < ydeg) {
ydeg = zdeg;
}
uint64_t ss;
long i;
long j;
for(i=0; i<=xdeg; i++) {
ss = z[i];
for(j=0; j<=i; j++) {
ss += ((uint64_t)(x[i-j]))*((uint64_t)(y[j]));
}
z[i] = ss%p;
}
for(i=xdeg+1; i<=ydeg; i++) {
ss = z[i];
for(j=0; j<=xdeg; j++) {
ss += ((uint64_t)(x[j]))*((uint64_t)(y[i-j]));
}
z[i] = ss%p;
}
for(i=ydeg+1; i<=zdeg; i++) {
ss = z[i];
for(j=i-xdeg; j<=ydeg; j++) {
ss += ((uint64_t)(x[i-j]))*((uint64_t)(y[j]));
}
z[i] = ss%p;
}
}

and the following test driver:

#include <stdio.h>
#include <stdint.h>
#include <sys/time.h>

extern void inner_mul(uint32_t * x, uint32_t * y, uint32_t * z,
uint32_t xdeg, uint32_t ydeg, uint32_t zdeg, uint32_t p);

int main(void) {
uint32_t x[85], y[85], z[169];
int i;
for(i=0;i<85;i++) {
x[i] = 1;
y[i] = 1;
}

struct timeval tv1, tv2;
gettimeofday(&tv1, 0);
int j;
for(j=0; j < 100000; j++) {
for(i=0;i<169; i++) {
z[i] = 1;
}
inner_mul(x, y, z, 84, 84, 168, 1000003);
}
gettimeofday(&tv2, 0);
for(i=0;i<12; i++) {
printf(" %u,", z[i]);
}
putchar('\n');
long tt = tv2.tv_sec - tv1.tv_sec;
tt *= (1000*1000);
tt += (tv2.tv_usec - tv1.tv_usec);
printf("Time: %ld us\n", tt);
return 0;
}

At least for gcc and clang put them is separate files to avoid
simplifing the task too much ('inner_mul' is supposed to work
with variable data, here we feed it the same thing several times).
Of course, the test driver is silly, but 'inner_mul' is doing
important computation and, as long as 'inner_mul' is compiled
without knowledge of actual parameters, the test should be fair.
My results are:

clang -O3 -march=native 126112us
clang -O3 222136us
clang -O 225855us
gcc -O3 -march=native 82809us
gcc -O3 114365us
gcc -O 287786us
tcc 757347us

There is some irregularity in timings, but this shows that
factor of order 9 is possible.

Notes:
- this code is somewhat hard to vectorize, but clang
and gcc manage to do this,
- vectorized code is sensitive to alignment of the data, some
variation may be due to this,
- modern processors dynamically change clock frequency, the
times seem to be high enough to trigger switch to maximal
frequency (initally I used smaller number of iterations
but timing were less regular),
- most of code is portable, but for timing we need timer with
sufficient resolution, so I use Unix 'gettimeofday'.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Sat Nov 23 10:30:31 2024

On 22/11/2024 19:29, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

On 22/11/2024 12:33, Waldek Hebisch wrote:

But, OK, here's the first sizeable benchmark that I thought of (I can't
find a reliable Dhrystone one; perhaps you can post a link).

First Google hit for Dhrystone 2.2a

https://homepages.cwi.nl/~steven/dry.chttps://homepages.cwi.nl/~steven/dry.c

(I used this one).

There was no shortage of them, there were just too many. All seemed to
need some Linux script to compile them, and all needed Linux anyway
because only that has sys/times.h.

I eventually find one for Windows, and that goes to the other extreme
and needs CL (MSVC) with these options:

cl /O2 /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /MD /W4 /Wp64 /Zi
/TP /EHsc /Fa /c dhry264.c dhry_264.c

Plus it uses various ASM routines written MASM syntax. I was partway
through getting it to work with my compiler, when I saw your post.

Your version is much simpler to get going, but still not straightforward because of 'gettimeofday', which is available via gcc, but is not
exported by msvcrt, which is what tcc and my product use.

I changed it to use clock().

The results then are like this (I tried two sizes of matrix element):

uint32_t uint64_t

gcc -O0 2165 2180 msec
gcc -O3 282 470

tcc 2572 2509

cc 2165 2243
mcc -opt 720 720

The mcc product keeps some local variables in registers, a minor
optimisation I will apply to cc in due course. It's not a priority,
since usually it makes little difference on real applications. Only on benchmarks like this.

gcc -O3 seems to enable some SIMD instructions, but only for u32. With
u64 elements, then gcc -O3 is only about 50% faster than my compiler.

If I try -march=native, then the 282 sometimes gets down to 235, and the
470 to 420.

(When functions like this were needed in my programs during 80s and 90s,
I used inline assembly. Most code wasn't that critical.)

- most of code is portable, but for timing we need timer with
sufficient resolution, so I use Unix 'gettimeofday'.

Why? Just make the task take long enough.

BTW I also ported your program to my 'M' language. The timing however
was about the same as mcc-opt.

The source is below if interested.

-------------------------------

type T=u32

proc inner_mul(ref[0:]T x, y, z, int xdeg, ydeg, zdeg, p) =
u64 ss

if ydeg<xdeg then
swap(x, y)
swap(xdeg, ydeg)
fi

xdeg min:=zdeg
ydeg min:=zdeg

for i in 0..xdeg do
ss:=z[i]
for j in 0..i do
ss +:=u64(x[i-j]) * u64(y[j])
od
z[i]:=ss rem p
od

for i in xdeg+1..ydeg do
ss:=z[i]
for j in 0..xdeg do
ss +:=u64(x[j]) * u64(y[i-j])
od
z[i]:=ss rem p
od

for i in ydeg+1..zdeg do
ss:=z[i]
for j in i-xdeg .. ydeg do
ss +:=u64(x[i-j]) * u64(y[j])
od
z[i]:=ss rem p
od

end

proc main=
[0:85]T x, y, z
int tv1, tv2

for i in x.bounds do
x[i]:=y[i]:=1
od

tv1:=clock()

to 100'000 do
for i in 0..168 do
z[i]:=1
od
inner_mul(&x,&y,&z, 84, 84, 168, 1'000'003)
od

tv2:=clock()
for i in 0..11 do
print z[i], $
od
println

println "Time:",tv2-tv1,"ms"
end

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Sun Nov 24 00:28:14 2024

On 22/11/2024 19:10, Kaz Kylheku wrote:

On 2024-11-22, Bart <bc@freeuk.com> wrote:

You also seem proud that in this example:

int F(int n) {
if (n==1) return 10;
if (n==2) return 20;
}

You can use 'unreachable()', a new C feature, to silence compiler
messages about running into the end of the function, something I
considered a complete hack.

Unreachable assertions are actually a bad trade if all you are looking
for is to suppress a diagnostic. Because the behavior is undefined
if the unreachable is actually reached.

You should only use "unreachable()" in places where it is /never/
actually reached - thus it is perfectly safe if you use it correctly.
(I'm not sure of any features of any language that are safe to use /incorrectly/.)

That's literally the semantic definition! "unreachable()" means,
roughly, "remove all definition of behavior from this spot in the
program".

Yes. So that's fine, as long as execution never reaches it. That's the
whole point - you are telling the compiler that this thing cannot
happen. Compilers optimise all the time on the basis of what they know
can and cannot happen - this just lets the programmer specify it.

Whereas falling off the end of an int-returning function only
becomes undefined if the caller obtains the return value,
and of course in the case of a void function, it's well-defined.

All true - but so what?

You are better off with:

assert(0 && "should not be reached");
return 0;

if asserts are turned off with NDEBUG, the function does something that
is locally safe, and offers the possibility of avoiding a disaster.

Asserts - or other temporary checks resulting in stopping the program
with a useful message - can be very helpful in debugging. If you are
not entirely sure that code execution can never reach a particular
point, then either don't use "unreachable()" there, or if you prefer,
put a conditional check there. "asset" is not magic - you can do the
same thing yourself :

#ifdef CHECK_UNREACHABLES
#define Unreachable() \
do { \
printf("Unreachable hit on line %i of file %s\r\n", \
__LINE__, __FILE__); \
exit(1); \
} while (0)
#else
#define Unreachable() unreachable()
#endif

Adjust it to suit your taste.

In released code, hitting a false assertion is a bug in your code that
should never happen. Hitting an "unreachable()" is a bug in your code
that should never happen. Either way, you've screwed up. And unless
you have good reason to believe that the user will actually give you all
the critical information you need to duplicate the situation and find
the bug, the assert is no better than the unreachable(). It is,
however, less efficient and it means you are adding extra code ("return
0;") that is of no use, and is in no way testable.

So to me, unreachable() is better than an assert that is never
triggered. And an assert that /could/ be triggered is not something I
would ever want in released code of the kind of program I write
(embedded systems).

The only valid reason for using unreachable is optimization: you're introducing something unsafe in order to get better machine code. When
the compiler is informed that the behavior is always undefined when some
code is reached, it can just delete that code and everything dominated
by it (reachable only through it).

"unreachable()" is not unsafe unless you are using it incorrectly. /Everything/ is unsafe if you are using it incorrectly.

The above function does not need a function return sequence to be
emitted for the fall-through case that is not expected to occur,
if the situation truly does not occur. Then if it does occur, hell
will break loose since control will fall through to whatever bytes
follow the abrupt end of the function.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Sun Nov 24 01:17:36 2024

On 22/11/2024 19:29, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

clang -O3 -march=native 126112us
clang -O3 222136us
clang -O 225855us
gcc -O3 -march=native 82809us
gcc -O3 114365us
gcc -O 287786us
tcc 757347us

You've omitted -O0 for gcc and clang. That timing probably won't be too
far from tcc, but compilation time for larger programs will be
significantly longer (eg. 10 times or more).

The trade-off then is not worth it unless you are running gcc for other reasons (eg. for deeper analysis, or to compile less portable code that
has only been tested on or written for gcc/clang; or just an irrational
hatred of simple tools).

There is some irregularity in timings, but this shows that
factor of order 9 is possible.

That's an extreme case, for one small program with one obvious
bottleneck where it spends 99% of its time, and with little use of
memory either.

For simply written programs, the difference is more like 2:1. For more complicated C code that makes much use of macros that can expand to lots
of nested function calls, it might be 4:1, since it might rely on
optimisation to inline some of those calls.

Again, that would be code written to take advantage of specific compilers.

But that is still computationally intensive code working on small
amounts of memory.

I have a text editor written in my scripting language. I can translate
its interpreter to C and compile with both gcc-O3 and tcc.

Then, yes, you will notice twice as much latency with the tcc
interpreter compared with gcc-O3, when doing things like
deleting/inserting lines at the beginning of a 1000000-line text file.

But typically, the text files will be 1000 times smaller; you will
notice no difference at all.

I'm not saying no optimisation is needed, ever, I'm saying that the NEED
for optimisation is far smaller than most people seem to think.

Here are some timings for that interpreter, when used to run a script to compute fib(38) the long way:

Interp Built with Timing

qc tcc 9.0 secs (qc is C transpiled version)
qq mm 5.0 (-fn; qq is original M version)

qc gcc-O3 4.0
qq mm 1.2 (-asm)

(My interpreter doesn't bother with faster switch-based or computed-goto
based dispatchers. The choice is between a slower function-table-based
one, and an accelerated threaded-code version using inline ASM.

These are selected with -fn/-asm options. The -asm version is not JIT;
it is still interpreting a bytecode at a time).

So the fastest version here doesn't use compiler optimisation, and it's
3 times the speed of gcc-O3. My unoptimised HLL code is also only 25%
slower than gcc-O3.

That is for this test, but that's also one that is popular for language benchmarks.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Sun Nov 24 03:45:47 2024

Bart <bc@freeuk.com> wrote:

On 22/11/2024 19:29, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

On 22/11/2024 12:33, Waldek Hebisch wrote:

But, OK, here's the first sizeable benchmark that I thought of (I can't
find a reliable Dhrystone one; perhaps you can post a link).

First Google hit for Dhrystone 2.2a

https://homepages.cwi.nl/~steven/dry.chttps://homepages.cwi.nl/~steven/dry.c >>
(I used this one).

There was no shortage of them, there were just too many. All seemed to
need some Linux script to compile them, and all needed Linux anyway
because only that has sys/times.h.

I eventually find one for Windows, and that goes to the other extreme
and needs CL (MSVC) with these options:

cl /O2 /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /MD /W4 /Wp64 /Zi
/TP /EHsc /Fa /c dhry264.c dhry_264.c

Plus it uses various ASM routines written MASM syntax. I was partway
through getting it to work with my compiler, when I saw your post.

Your version is much simpler to get going, but still not straightforward because of 'gettimeofday', which is available via gcc, but is not
exported by msvcrt, which is what tcc and my product use.

I changed it to use clock().

The results then are like this (I tried two sizes of matrix element):

uint32_t uint64_t

gcc -O0 2165 2180 msec
gcc -O3 282 470

tcc 2572 2509

cc 2165 2243
mcc -opt 720 720

The mcc product keeps some local variables in registers, a minor optimisation I will apply to cc in due course. It's not a priority,
since usually it makes little difference on real applications. Only on benchmarks like this.

gcc -O3 seems to enable some SIMD instructions, but only for u32. With
u64 elements, then gcc -O3 is only about 50% faster than my compiler.

If I try -march=native, then the 282 sometimes gets down to 235, and the
470 to 420.

(When functions like this were needed in my programs during 80s and 90s,
I used inline assembly. Most code wasn't that critical.)

FYI, ATM is have a version compiling via Lisp, with bounds checking
on it takes 0.58s, with bounds checking off it takes 0.43s
on my machine. The reason to look at C version is to do better.
Taken together, your and my timing indicate that your 'cc' will
give me less speed than going via Lisp. 'mcc -opt' pobably would
give an impovement, but not compared to 'gcc'. BTW, below times
on a slower machine (5 years old cheap laptop):

gcc -O3 -march=native 1722910us
gcc -O3 1720884us
gcc -O 1642328us
tcc 7661992us

via Lisp, checking 5.29s
via Lisp, no checking 4.27s

With -O3 gcc vectorizes inner loops, but apparently on this machine
it backfires and execution time is longer than without vectorization.

In both cases 'tcc' gives slower code than going via Lisp with
array bounds checking on, so ATM using 'tcc' for this application
is rather unattractive.

I may end up using inline assembly, but this is a mess: code for
fast machine will not run on older ones, on some machines
non-vectorized code is faster. So I would need mutiple versions
of assembler just to cover x86_64. And I have other targets.
And this is just one of critical routines. I have probably about
10 such critical routines now and it may grow to about 50.
To get good speed I am experimeting with various variants.
So going assembler way I could be forced to write several
thousends of lines of optimized assembler (most of that to
throw out, but before writing them I would not know which
ones are the best). That would be much more work than just
passing various options to 'gcc' and 'clang' and measuring
execution time.

- most of code is portable, but for timing we need timer with
sufficient resolution, so I use Unix 'gettimeofday'.

Why? Just make the task take long enough.

Well, Windows 'clock' looks OK, but some old style timing routines
have really low resolution and would lead to excessive run
time (I need to run rather large number of tests).

BTW I also ported your program to my 'M' language. The timing however
was about the same as mcc-opt.

The source is below if interested.

AFAICS you have assign-op combinations like 'min:='. ATM I am
undecided about similar operations. I mean, in a language which
like C applies operator only to base types they give some gain.
But I want operators working on large variety of types, and then
it is not clear how to define them.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Sun Nov 24 09:36:14 2024

On 23/11/2024 16:45, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

FYI, ATM is have a version compiling via Lisp, with bounds checking
on it takes 0.58s, with bounds checking off it takes 0.43s
on my machine. The reason to look at C version is to do better.
Taken together, your and my timing indicate that your 'cc' will
give me less speed than going via Lisp. 'mcc -opt' pobably would
give an impovement, but not compared to 'gcc'. BTW, below times
on a slower machine (5 years old cheap laptop):

gcc -O3 -march=native 1722910us
gcc -O3 1720884us
gcc -O 1642328us
tcc 7661992us

via Lisp, checking 5.29s
via Lisp, no checking 4.27s

With -O3 gcc vectorizes inner loops, but apparently on this machine
it backfires and execution time is longer than without vectorization.

In both cases 'tcc' gives slower code than going via Lisp with
array bounds checking on, so ATM using 'tcc' for this application
is rather unattractive.

Lisp is a rather mysterious language which can apparently be and do
anything: it can be interpreted or compiled. Statically typed or
dynamic. Imperative or functional.

It can also apparently be implemented in a few dozen lines or code.

Forth has similar claims.

So Lisp being as fast or faster than C is not surprising!

I may end up using inline assembly, but this is a mess: code for
fast machine will not run on older ones, on some machines
non-vectorized code is faster. So I would need mutiple versions
of assembler just to cover x86_64. And I have other targets.
And this is just one of critical routines. I have probably about
10 such critical routines now and it may grow to about 50.
To get good speed I am experimeting with various variants.
So going assembler way I could be forced to write several
thousends of lines of optimized assembler (most of that to
throw out, but before writing them I would not know which
ones are the best). That would be much more work than just
passing various options to 'gcc' and 'clang' and measuring
execution time.

Using assembly to get speed is not as easy as it used to be. Most such attempts seem to generate slower code. Only for certain apps such as interpreters, but there you are dealing with a bigger picture than one particular bottleneck.

- most of code is portable, but for timing we need timer with
sufficient resolution, so I use Unix 'gettimeofday'.

Why? Just make the task take long enough.

Well, Windows 'clock' looks OK, but some old style timing routines
have really low resolution and would lead to excessive run
time (I need to run rather large number of tests).

I've tried all sorts, from Windows' high performance routines, down to
x64's RDTSC instruction. They all gave unreliable, variable results. Now
I just use 'clock', but might turn off all other apps for extra conistency.

BTW I also ported your program to my 'M' language. The timing however
was about the same as mcc-opt.

The source is below if interested.

AFAICS you have assign-op combinations like 'min:='. ATM I am
undecided about similar operations. I mean, in a language which
like C applies operator only to base types they give some gain.
But I want operators working on large variety of types, and then
it is not clear how to define them.

A assignment that in C syntax might be written as:

x op= y;

would be the equivalent of this, when x has type T:

T* p;
p = &x;
*p = op(*p, (T)y);

If 'op' is not defined for operands of T, then it just won't work.
(Arithmetic ops won't work on most usertypes, but the language still
allows x += y.)

However the IL I use directly supports min/max including augmented
assignment (in-place) versions.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Sun Nov 24 11:24:30 2024

Bart <bc@freeuk.com> wrote:

On 22/11/2024 19:29, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

clang -O3 -march=native 126112us
clang -O3 222136us
clang -O 225855us
gcc -O3 -march=native 82809us
gcc -O3 114365us
gcc -O 287786us
tcc 757347us

You've omitted -O0 for gcc and clang. That timing probably won't be too
far from tcc, but compilation time for larger programs will be
significantly longer (eg. 10 times or more).

The trade-off then is not worth it unless you are running gcc for other reasons (eg. for deeper analysis, or to compile less portable code that
has only been tested on or written for gcc/clang; or just an irrational hatred of simple tools).

I have tried to use 'tcc' for one on the project that I mentioned
before. It appears to work, real time for build is essentially
the same (actually some fraction of second longer, but that is
withing measurement noise), CPU time _may_ be shorter by 1.6%.
This confirms my earlier estimates that for that project C
compile time has very small impact on overall compile time
(most compilations are not C compilation). In this project
I use '-O' which is likely to give better runtime speed
(I do not bother with '-O2' or '-O3'). Also, I use '-O' for
better diagnostics.

I a second project, '-O2' is used for image processing library,
this takes significant time to compile, but this library is
performance critical code.

There is some irregularity in timings, but this shows that
factor of order 9 is possible.

That's an extreme case, for one small program with one obvious
bottleneck where it spends 99% of its time, and with little use of
memory either.

For simply written programs, the difference is more like 2:1. For more complicated C code that makes much use of macros that can expand to lots
of nested function calls, it might be 4:1, since it might rely on optimisation to inline some of those calls.

Again, that would be code written to take advantage of specific compilers.

But that is still computationally intensive code working on small
amounts of memory.

I have a text editor written in my scripting language. I can translate
its interpreter to C and compile with both gcc-O3 and tcc.

Then, yes, you will notice twice as much latency with the tcc
interpreter compared with gcc-O3, when doing things like
deleting/inserting lines at the beginning of a 1000000-line text file.

But typically, the text files will be 1000 times smaller; you will
notice no difference at all.

I'm not saying no optimisation is needed, ever, I'm saying that the NEED
for optimisation is far smaller than most people seem to think.

There is also question of disc space. 'tcc' compiled by itself is
404733 bytes (code + data) (0.024s compile time), by gcc (default) is
340950 (0.601s compile time), by gcc -O is 271229 (1.662s compile
time), by gcc -Os is 228855 (2.470s compile time), by gcc -O2
is 323392 (3.364s compile time), gcc -O3 is 407952 (4.627s compile
time). As you can see gcc -Os can save quite a bit of disc space
for still moderate compile time.

And of course, there is a question why program with runtime that
does not matter is written in a low level language? Experience
shows that using higher level language is easier, and higher
level language compiled to bytecode can give significantly smaler
code than gcc -Os from low-level code. Several programs for
early micros used bytecode because this was the only way to
fit the program into available memory.

Here are some timings for that interpreter, when used to run a script to compute fib(38) the long way:

Interp Built with Timing

qc tcc 9.0 secs (qc is C transpiled version)
qq mm 5.0 (-fn; qq is original M version)

qc gcc-O3 4.0
qq mm 1.2 (-asm)

(My interpreter doesn't bother with faster switch-based or computed-goto based dispatchers. The choice is between a slower function-table-based
one, and an accelerated threaded-code version using inline ASM.

These are selected with -fn/-asm options. The -asm version is not JIT;
it is still interpreting a bytecode at a time).

So the fastest version here doesn't use compiler optimisation, and it's
3 times the speed of gcc-O3. My unoptimised HLL code is also only 25%
slower than gcc-O3.

Well, most folks would "not bother" with inline ASM and instead use
fastest wersion that C can give. Which likely would involve
gcc -O2 or gcc -O3.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Sun Nov 24 12:36:44 2024

On 24/11/2024 00:24, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

I'm not saying no optimisation is needed, ever, I'm saying that the NEED
for optimisation is far smaller than most people seem to think.

There is also question of disc space. 'tcc' compiled by itself is
404733 bytes (code + data) (0.024s compile time), by gcc (default) is
340950 (0.601s compile time), by gcc -O is 271229 (1.662s compile
time), by gcc -Os is 228855 (2.470s compile time), by gcc -O2
is 323392 (3.364s compile time), gcc -O3 is 407952 (4.627s compile
time). As you can see gcc -Os can save quite a bit of disc space
for still moderate compile time.

I thought David Brown said that disk space is irrelevant? Anyway this is
the exact copy of what I tried just now, compiling a 5-line hello.c
program. I hadn't used these compilers since earlier today:

c:\c>tm gcc hello.c
TM: 5.80

c:\c>tm tcc hello.c
TM: 0.19

c:\c>tm gcc hello.c
TM: 0.19

c:\c>tm tcc hello.c
TM: 0.03

From cold, gcc took nearly 6 seconds (if you've been used to instant
feedback all day, it can feel like an age). tcc took 0.2 seconds.

Doing it a second time, now gcc takes 0.2 seconds, and tcc takes 0.03
seconds! (It can't get much faster on Windows.)

gcc is just a lumbering giant, a 870MB installation, while tcc is 2.5MB.
As for sizes:

c:\c>dir hello.exe
24/11/2024 00:44 2,048 hello.exe

c:\c>dir a.exe
24/11/2024 00:44 91,635 a.exe (48K with -s)

(At least that's one good thing of gcc writing out that weird a.exe each
time; I can compare both exes!)

As for mine (however it's possible I used it more recently):

c:\c>tm cc hello
Compiling hello.c to hello.exe
TM: 0.04

c:\c>dir hello.exe
24/11/2024 00:52 2,560 hello.exe

My installation is 0.3MB (excluding windows.h which is 0.6MB). Being self-contained, I can trivally apply UPX compression to get a 0.1MB
compiler, which can be easily copied to a memory stick or bundled in one
of my apps. However compiling hello.c now takes 0.05 seconds.

(I don't use UPX because my apps are already tiny; it's just to marvel
at how much redundancy they still contain, and how much tinier they
could be.)

I know none of this will cut any ice; for various reasons you don't want
to use tcc.

One of them being that your build process involves N slow stages so
speeding up just one makes little difference.

This however is very similar to my argument about optimisation; a
running app consists of lots of parts which take up execution time, not
all of which can be speeded up by a factor of 9. The net benefit will be
a lot less, just like your reduced build time.

And of course, there is a question why program with runtime that
does not matter is written in a low level language?

I mean it doesn't matter if it's half the speed. It might matter if it
was 40 times slower.

There's quite a gulf between even unoptimised native code and even a
fast dynamic language interpreter.

People seem to think that the only choices are the fastest possible C
code at one end, and slow CPython at the other:

gcc/O3-tcc-----------------------------------------------------CPython

On this scale, gcc/O3 code and tcc code are practically the same!

So the fastest version here doesn't use compiler optimisation, and it's
3 times the speed of gcc-O3. My unoptimised HLL code is also only 25%
slower than gcc-O3.

Well, most folks would "not bother" with inline ASM and instead use
fastest wersion that C can give. Which likely would involve
gcc -O2 or gcc -O3.

But in this case, it works by giving my a product that, even using a non-optimising compiler, makes an application faster than using gcc-O3.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Sun Nov 24 12:45:34 2024

On 24/11/2024 01:36, Bart wrote:

On 24/11/2024 00:24, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

And of course, there is a question why program with runtime that
does not matter is written in a low level language?

I mean it doesn't matter if it's half the speed. It might matter if it
was 40 times slower.

There's quite a gulf between even unoptimised native code and even a
fast dynamic language interpreter.

People seem to think that the only choices are the fastest possible C
code at one end, and slow CPython at the other:

� gcc/O3-tcc-----------------------------------------------------CPython

On this scale, gcc/O3 code and tcc code are practically the same!

(I wasn't able to post results earlier because CPython hadn't finished.
But for a JPEG decoder test on an 85Mpixel image, all using the same algorithm:

gcc-O3 2.2 seconds
mm6-opt 3.3 seconds (My older compiler with the register optim.)
mm7 5.7 seconds (My unoptimising new one)
cc 6.0 seconds (Unoptimising)
tcc 8.1 seconds
PyPy 43 seconds (Uses JIT to optimise hot loops to native code)
CPython 386 seconds)

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Sun Nov 24 16:03:17 2024

Bart <bc@freeuk.com> wrote:

On 24/11/2024 00:24, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

I'm not saying no optimisation is needed, ever, I'm saying that the NEED >>> for optimisation is far smaller than most people seem to think.

There is also question of disc space. 'tcc' compiled by itself is
404733 bytes (code + data) (0.024s compile time), by gcc (default) is
340950 (0.601s compile time), by gcc -O is 271229 (1.662s compile
time), by gcc -Os is 228855 (2.470s compile time), by gcc -O2
is 323392 (3.364s compile time), gcc -O3 is 407952 (4.627s compile
time). As you can see gcc -Os can save quite a bit of disc space
for still moderate compile time.

I thought David Brown said that disk space is irrelevant?

I am not David Brown.

Anyway this is
the exact copy of what I tried just now, compiling a 5-line hello.c
program. I hadn't used these compilers since earlier today:

c:\c>tm gcc hello.c
TM: 5.80

c:\c>tm tcc hello.c
TM: 0.19

c:\c>tm gcc hello.c
TM: 0.19

c:\c>tm tcc hello.c
TM: 0.03

From cold, gcc took nearly 6 seconds (if you've been used to instant feedback all day, it can feel like an age). tcc took 0.2 seconds.

Doing it a second time, now gcc takes 0.2 seconds, and tcc takes 0.03 seconds! (It can't get much faster on Windows.)

gcc is just a lumbering giant, a 870MB installation, while tcc is 2.5MB.

Yes, but exact size depends which version you install and how you
install it. I installed version 6.5 and removed debugging info from executables. The result is 177MB, large but significantly smaller
than what you have. Debian package for gcc-12.2 is something like
144MB (+ about 8MB libraries which are usable for other purpose but
mainly for gcc), but it only gives C compiler. To that one should
add 'libc6-dev' (about 12MB) which is needed to create useful
programs. C++ adds 36MB, Fortran 35MB, Ada 94MB so my installation
is something like 330MB. Note: my 177MB reuses probably about 50MB
from system installation and includes C and C++. Also, in both cases
I do not count libc which is about 13MB (but needed by almost
anything in the system), shell kernel, etc.

On Windows some space savings trick do not work and traditionally
program ship their own libraries, so size may be bigger.

For me it is problematic that each gcc language and each extra
target adds a lot of space. I have extra targets (not counted in
size above) and together this is closer to 1G. In this aspect
LLVM is somewhat better, it gives me more targets that I have
intalled for gcc for total "cost" of something like 210MB (plus
about 50MB shared with gcc).

As for sizes:

c:\c>dir hello.exe
24/11/2024 00:44 2,048 hello.exe

c:\c>dir a.exe
24/11/2024 00:44 91,635 a.exe (48K with -s)

(At least that's one good thing of gcc writing out that weird a.exe each time; I can compare both exes!)

AFAICS this is one-time Windows overhead + default layout rules for
the linker. On Linux I get 15952 bytes by defauls, 14472 after
striping. However, the actual code + data size is 1904 and even
in this most is crap needed to support extra features of C library.

In other words, this is mostly irrelevant, as people who want to
get size down can link it with different options to get smaller
size down. Actual hello world code size is 99 bytes when compiled
by gcc (default options) and 64 bytes by tcc. Again, gcc add things
like exception handling which increase size for tiny files, but
do not add much in a bigger file.

I did
hebisch@komp:~/kompi$ gcc -c hell2.c
hebisch@komp:~/kompi$ tcc -o hell2.gcc hell2.o
hebisch@komp:~/kompi$ tcc -c hell2.c
hebisch@komp:~/kompi$ tcc -o hell2.tcc hell2.o
hebisch@komp:~/kompi$ ls -l hell2.gcc hell2.tcc
-rwxr-xr-x 1 hebisch hebisch 3680 Nov 24 04:21 hell2.gcc
-rwxr-xr-x 1 hebisch hebisch 3560 Nov 24 04:21 hell2.tcc

As you can see, when using tcc as a linker there is small size
difference due to extra exception handling code put there by gcc.
This size difference will vanish in the noise when there is
bigger real code. And when you are really determined, linker
tricks can completely remove the exception handling code (AFAICS
it is not needed for simple programs).

As for mine (however it's possible I used it more recently):

c:\c>tm cc hello
Compiling hello.c to hello.exe
TM: 0.04

c:\c>dir hello.exe
24/11/2024 00:52 2,560 hello.exe

My installation is 0.3MB (excluding windows.h which is 0.6MB). Being self-contained, I can trivally apply UPX compression to get a 0.1MB compiler, which can be easily copied to a memory stick or bundled in one
of my apps. However compiling hello.c now takes 0.05 seconds.

(I don't use UPX because my apps are already tiny; it's just to marvel
at how much redundancy they still contain, and how much tinier they
could be.)

I know none of this will cut any ice; for various reasons you don't want
to use tcc.

Well, I tried to use tcc when it first appeared. Unfortunalty it
could not compile some valid C code that I passed to it. I filled
a bug report, but it was not fixed for several years. Shortly after
that I got AMD-64 machine and configured it as only 64-bit (one
reason to do this was to avoid bloat due to having both 64-bit
and 32-bit libraries). At that time and in following several
years tcc did not support 64-bit code, so was not usable for me.
Later IIRC it got 64-bit support, but I needed also ARM (and
on ARM faster compiler would make more difference).

There is question of trust: when what I reported remained unfixed
I lost faith in quality of tcc. I still need to check if it is
fixed now, but at least now tcc seem to have some developement.

One of them being that your build process involves N slow stages so
speeding up just one makes little difference.

Yes.

This however is very similar to my argument about optimisation; a
running app consists of lots of parts which take up execution time, not
all of which can be speeded up by a factor of 9. The net benefit will be
a lot less, just like your reduced build time.

If I do not have good reasons to write program in C, then likely I
will write it in some higher-level language. One good reason
to use C is to code performance-critical routines.

And of course, there is a question why program with runtime that
does not matter is written in a low level language?

I mean it doesn't matter if it's half the speed. It might matter if it
was 40 times slower.

If you code bottlenecks in C, than 40 times slower may be OK for the
rest. And there are compiled higher-level languages, you pay for
higher-level features, but overhead is much lower, closer to your
half speed (and that is mostly due to simpler code generator).

There's quite a gulf between even unoptimised native code and even a
fast dynamic language interpreter.

People seem to think that the only choices are the fastest possible C
code at one end, and slow CPython at the other:

gcc/O3-tcc-----------------------------------------------------CPython

On this scale, gcc/O3 code and tcc code are practically the same!

There is Ocaml, it offers interpreter (faster than Python) and a
compiler (which pobably gives faster code than your 'mcc -opt').
There are Lisp compilers. There is Java and C# (I am avoiding
them as they depend on sizeable runtime and due to propritary
games played by the vendors).

IME big productivity boost comes from garbage collection. But
nobody knows how to make cooperating garbage collectors. So
each garbage collected runtime forms its own island which has
trouble reusing code from other garbage collected environments.
ATM Python is biggest kind-of garbage collected environment so
people are attracted to it to reuse existing code.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Sun Nov 24 22:46:04 2024

Bart <bc@freeuk.com> wrote:

On 22/11/2024 12:51, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

int main(void) {
int a;
int* p = 0;
a = *p;
}

Here's what happens with my C compiler when told to interpret it:

c:\cx>cc -i c
Compiling c.c to c.(int)
Error: Null ptr access

Here's what happens with gcc:

c:\cx>gcc c.c
c:\cx>a
<crashes>

Is there some option to insert such a check with gcc? I've no idea; most >>> people don't.

I would do

gcc -g c.c
gdb a.out
run

and gdb would show me place with bad access. Things like bound
checking array access or overflow checking makes a big difference.
Null pointer access is reliably detected by hardware so no big
deal. Say what you 'cc' will do with the following function:

int
foo(int n) {
int a[10];
int i;
int res = 0;
for(i = 0; i <= 10; i++) {
a[i] = n + i;
}
for(i = 0; i <= 10; i++) {
res += a[i];
}
res;
}

Here gcc at compile time says:

foo.c: In function ‘foo’:
foo.c:15:17: warning: iteration 10 invokes undefined behavior [-Waggressive-loop-optimizations]
15 | res += a[i];
| ~^~~
foo.c:14:18: note: within this loop
14 | for(i = 0; i <= 10; i++) {
| ~~^~~~~

My 'cc -i' wouldn't detect it. The -i tells it to run an interpreter on
the intermediate code. Within the interpreter, some things are easily checked, but bounds info on arrays doesn't exist. (The IL supports only pointer operations, not high level array ops.)

That would need intervention at an earlier stage, but even then, the
design of C makes that difficult. First, because array types like
int[10] decay to simple pointers, and ones represented by types like
int* don't have bounds info at all. (I don't support int[n] params and
few people use them anyway.)

There is well-known technique of "fat pointers": pointer keeps info
about area + actual pointer. So 3 machine words instead of 1.
This have some trouble when you convert between pointers and
integers, but program above is not doing this.

In program above one could use a simple compile-time checking:
keep info about array declaration (which you need anyway to
implement 'sizeof'), and during access when array "decays" to
a pointer keep info about bounds. Using VMT that could be
extended to the whole program (of course it would fail when
user passes pointers in traditional C way, but would work
for "well behaved" programs).

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Sun Nov 24 22:47:58 2024

Bart <bc@freeuk.com> wrote:

On 24/11/2024 01:36, Bart wrote:

On 24/11/2024 00:24, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

And of course, there is a question why program with runtime that
does not matter is written in a low level language?

I mean it doesn't matter if it's half the speed. It might matter if it
was 40 times slower.

There's quite a gulf between even unoptimised native code and even a
fast dynamic language interpreter.

People seem to think that the only choices are the fastest possible C
code at one end, and slow CPython at the other:

� gcc/O3-tcc-----------------------------------------------------CPython

On this scale, gcc/O3 code and tcc code are practically the same!

(I wasn't able to post results earlier because CPython hadn't finished.
But for a JPEG decoder test on an 85Mpixel image, all using the same algorithm:

gcc-O3 2.2 seconds
mm6-opt 3.3 seconds (My older compiler with the register optim.)
mm7 5.7 seconds (My unoptimising new one)
cc 6.0 seconds (Unoptimising)
tcc 8.1 seconds
PyPy 43 seconds (Uses JIT to optimise hot loops to native code)
CPython 386 seconds)

That looks like example of a program that should use optimizing
compiler.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Sun Nov 24 23:18:39 2024

Bart <bc@freeuk.com> wrote:

On 23/11/2024 16:45, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

FYI, ATM is have a version compiling via Lisp, with bounds checking
on it takes 0.58s, with bounds checking off it takes 0.43s
on my machine. The reason to look at C version is to do better.
Taken together, your and my timing indicate that your 'cc' will
give me less speed than going via Lisp. 'mcc -opt' pobably would
give an impovement, but not compared to 'gcc'. BTW, below times
on a slower machine (5 years old cheap laptop):

gcc -O3 -march=native 1722910us
gcc -O3 1720884us
gcc -O 1642328us
tcc 7661992us

via Lisp, checking 5.29s
via Lisp, no checking 4.27s

With -O3 gcc vectorizes inner loops, but apparently on this machine
it backfires and execution time is longer than without vectorization.

In both cases 'tcc' gives slower code than going via Lisp with
array bounds checking on, so ATM using 'tcc' for this application
is rather unattractive.

Lisp is a rather mysterious language which can apparently be and do anything: it can be interpreted or compiled.

If a parser generates parse tree, then you can use it as an
input to actual compiler. Or you can interpret it. Applies
to almost any language.

Statically typed or
dynamic.

Normal Lisp data is tagged, so one can use dynamic typing. But
Lisp also have type declaration which basicaly say to the compiler
"trust me, this will always be an integer" (when needed replace
integer by some other type). Lisp has a subset which is similar
to Fortran 77: there are arrays, conditionals, loops etc. Arrays
may be specialized, say so that they can keep only machine integers
or doubles. Lisp declarations works in similar way as Fortran 77
declarations: they tell compiler to use machine instructions
for specified type. Difference is that lacking declarations
Lisp will use dynamic typing. Anyway, it is possible to translate
Fortran 77 into Lisp and speed of resulting code depends mainly
on quality of code generator.

This approach could be used for a lot of different languages,
just many language implementations do not bother with providing
a compiler and then declarations have minor effect.

Imperative or functional.

It can also apparently be implemented in a few dozen lines or code.

Few dozen lines is minimal old Lisp implemented in Lisp. Smallest implementation in C is very minimal and has about 500 lines.
There is Lisp standard, to implement it you probably need about
20000 lines. Anyway, modern Lisp implementation are much larger
than your languages.

- most of code is portable, but for timing we need timer with
sufficient resolution, so I use Unix 'gettimeofday'.

Why? Just make the task take long enough.

Well, Windows 'clock' looks OK, but some old style timing routines
have really low resolution and would lead to excessive run
time (I need to run rather large number of tests).

I've tried all sorts, from Windows' high performance routines, down to
x64's RDTSC instruction. They all gave unreliable, variable results. Now
I just use 'clock', but might turn off all other apps for extra conistency.

On Linux 'gettimeofday' reliably gives real time with good
resolution. There is a trouble, as CPU-s now have variable
frequency clock, use slow clock when load is low and switch
to fast clock only under heavier load. One way to solve it
is to use an utility which pins clock to specific frequency.
Another is to run long enough to switch to higher frequency,
so that lower frequency part does not change much.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Sun Nov 24 23:20:20 2024

On 24/11/2024 05:03, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

gcc is just a lumbering giant, a 870MB installation, while tcc is 2.5MB.

Yes, but exact size depends which version you install and how you
install it. I installed version 6.5 and removed debugging info from executables. The result is 177MB, large but significantly smaller
than what you have.

Most of a gcc installation is hundreds of header and archive (.a) files
for various libraries. There might be 32-bit and 64-bit versions. I
understand that. But it also makes it hard to isolate the core compiler.

I might try copying the directory tree to a pen-drive, but if there was
one essential file missing out of 1000s, I wouldn't know. Test-running
it from the pen-drive wouldn't work as, on Windows, it will likely be
picking up those files from the original installation.

In fact, it's quite hard to run two or more gcc versions on Windows,
since they use the OS's list of search paths to look for support files (cc1.exe etc), rather than use a path relative to the location of the
gcc.exe that was launched.

A single-file compiler doesn't have that problem, as there are no
auxiliary files!

As for sizes:

c:\c>dir hello.exe
24/11/2024 00:44 2,048 hello.exe

c:\c>dir a.exe
24/11/2024 00:44 91,635 a.exe (48K with -s)

(At least that's one good thing of gcc writing out that weird a.exe each
time; I can compare both exes!)

AFAICS this is one-time Windows overhead + default layout rules for
the linker. On Linux I get 15952 bytes by defauls, 14472 after
striping. However, the actual code + data size is 1904 and even
in this most is crap needed to support extra features of C library.

In other words, this is mostly irrelevant, as people who want to
get size down can link it with different options to get smaller
size down. Actual hello world code size is 99 bytes when compiled
by gcc (default options) and 64 bytes by tcc.

I get a size of 3KB for tcc compiling hello.c under WSL.

On Windows, my cc compiler has the option of generating my private
binary format called 'MX':

c:\c>cc -mx hello
Compiling hello.c to hello.mx

c:\c>dir hello.mx
24/11/2024 11:58 194 hello.mx

Then the size is 194 bytes (most of that is a big header and list of
default DLL files to import). However that requires a one-off launcher
(12KB compiled as C) to run it:

c:\c>runmx hello
Hello, World!

(In practice, MX files are bigger than equivalent EXEs since they
contain more reloc info. I developed the format before I had options for PIC/relocatable code, which is necessary for OBJ/DLL formats.)

I know none of this will cut any ice; for various reasons you don't want
to use tcc.

Well, I tried to use tcc when it first appeared.

Up until 0.9.26 it was quite poor. That was the time I started my C
compiler project (2017). At one point, I had a program (a lexing
benchmark), which ran slightly faster under my dynamic language
interpreter, than using tcc-compiled native code!

This was because of its poor implementation of 'switch' which figured
heavily in my test.

But later that year, 0.9.27 came out, which fixed such issues, and was a
much better, complete and conforming C compiler all round than my product.

There is question of trust: when what I reported remained unfixed
I lost faith in quality of tcc. I still need to check if it is
fixed now, but at least now tcc seem to have some developement.

One of them being that your build process involves N slow stages so
speeding up just one makes little difference.

Yes.

This however is very similar to my argument about optimisation; a
running app consists of lots of parts which take up execution time, not
all of which can be speeded up by a factor of 9. The net benefit will be
a lot less, just like your reduced build time.

If I do not have good reasons to write program in C, then likely I
will write it in some higher-level language. One good reason
to use C is to code performance-critical routines.

It can also do manipulations that are harder in a 'softer', safer HLL.
(My scripting language however can still do most of those underhand things.)

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Mon Nov 25 02:00:17 2024

Bart <bc@freeuk.com> wrote:

On 24/11/2024 05:03, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

As for sizes:

c:\c>dir hello.exe
24/11/2024 00:44 2,048 hello.exe

c:\c>dir a.exe
24/11/2024 00:44 91,635 a.exe (48K with -s)

(At least that's one good thing of gcc writing out that weird a.exe each >>> time; I can compare both exes!)

AFAICS this is one-time Windows overhead + default layout rules for
the linker. On Linux I get 15952 bytes by defauls, 14472 after
striping. However, the actual code + data size is 1904 and even
in this most is crap needed to support extra features of C library.

In other words, this is mostly irrelevant, as people who want to
get size down can link it with different options to get smaller
size down. Actual hello world code size is 99 bytes when compiled
by gcc (default options) and 64 bytes by tcc.

I get a size of 3KB for tcc compiling hello.c under WSL.

That more or less agrees with file size that I reported. I
prefer to look at what 'size' reports and at looking at .o
files, as this is more relevant when scaling to larger
programs. Simply, 10000 programs with 16kB overhead each
is 160MB overhead. When it matters, then I am likely to
have much less than 10000 executables. 100 executables
each 10MB are more likely. Note that there is old Unix
trick of puting multiple programs into a single file
(executable). The executable appears in filesystem under
say 100 names and performs different things depending on
the name. There is dispatching code, something like 40
bytes per name, so there is overhead, but much lower than
having independent executables. So, per program
overhead can be quite small.

Larger size (16kB) is due to page alignment of program parts
which have some benefits. So there are tradofs, and when
size matters there are ways to save disc space. OTOH if
actual code takes a lot of space, then there is no easy
solution.

On Windows, my cc compiler has the option of generating my private
binary format called 'MX':

c:\c>cc -mx hello
Compiling hello.c to hello.mx

c:\c>dir hello.mx
24/11/2024 11:58 194 hello.mx

Then the size is 194 bytes (most of that is a big header and list of
default DLL files to import). However that requires a one-off launcher
(12KB compiled as C) to run it:

c:\c>runmx hello
Hello, World!

(In practice, MX files are bigger than equivalent EXEs since they
contain more reloc info. I developed the format before I had options for PIC/relocatable code, which is necessary for OBJ/DLL formats.)

In Linux typically filesystem block size is 4kB, so anything bigger
than 0 takes at least 4kB. So super-small executables (I think
record is below 200 bytes) do not really save space. And they
actually need more RAM, as system first loads program file into
buffers. If program is properly organized, it could be executed
directly from file buffer. But super-small executable needs extra
copy, to put parts in separate pages. So there is a compromise
between memory use and disc space, and usually moderate increase
in disc use is considered worth lower memory use.

If I do not have good reasons to write program in C, then likely I
will write it in some higher-level language. One good reason
to use C is to code performance-critical routines.

It can also do manipulations that are harder in a 'softer', safer HLL.
(My scripting language however can still do most of those underhand things.)

Anything computational can be done in a HLL. You may wish to
play tricks to save time. Or possible some packing tricks to
save memory. But packing tricks can be done in HLL (say by
treating whole memory as a big array of u64), so this really
boils down to speed.

You may wish to write an OS or to interact with hardware, but
here I usuallt want optimization. Maybe not as aggressive
as modern gcc, but at least of order of gcc-1 (which probably
would probably have compile times tens times lower than modern
gcc).

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Mon Nov 25 04:50:36 2024

On 24/11/2024 15:00, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

On 24/11/2024 05:03, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

As for sizes:

c:\c>dir hello.exe
24/11/2024 00:44 2,048 hello.exe

c:\c>dir a.exe
24/11/2024 00:44 91,635 a.exe (48K with -s)

(At least that's one good thing of gcc writing out that weird a.exe each >>>> time; I can compare both exes!)

AFAICS this is one-time Windows overhead + default layout rules for
the linker. On Linux I get 15952 bytes by defauls, 14472 after
striping. However, the actual code + data size is 1904 and even
in this most is crap needed to support extra features of C library.

In other words, this is mostly irrelevant, as people who want to
get size down can link it with different options to get smaller
size down. Actual hello world code size is 99 bytes when compiled
by gcc (default options) and 64 bytes by tcc.

I get a size of 3KB for tcc compiling hello.c under WSL.

That more or less agrees with file size that I reported. I
prefer to look at what 'size' reports and at looking at .o
files,

Oh, I thought you were reporting sizes of 99 and 64 bytes, in response
to tcc's 2048 bytes.

So I'm not sure what you mean by 'actual' size, unless it is the same as
this reported by my product here (comments added):

c:\cx>cc -v hello
Compiling hello.c to hello.exe
Code size: 34 bytes # .text
Idata size: 15 # .data
Code+Idata: 49
Zdata size: 0 # .bss
EXE size: 2,560

So at 49 bytes, I guess I win! But in terms of actual file-size, since
both tcc/cc can run programs from source, then all that's needed is
hello.c, 53 bytes minimum.

It can also do manipulations that are harder in a 'softer', safer HLL.
(My scripting language however can still do most of those underhand things.)

Anything computational can be done in a HLL. You may wish to
play tricks to save time. Or possible some packing tricks to
save memory. But packing tricks can be done in HLL (say by
treating whole memory as a big array of u64), so this really
boils down to speed.

I'm sure that with Python, say, pretty much anything can be done given
enough effort. Even if it means cheating by using external add-on
modules to get around language limitations, like using Ctypes module,
which you will likely find uses C code.

This is different from having things part of the core language so they
become effortless and natural.

But, everything you've said seems to have backed up my remark that
people only seem to consider two possibilities:

* Either a scripting language where it doesn't matter that it's 1-2
magnitudes slower than native code

* Or a compiled language where it absolutely MUST be at least as fast as gcc/clang-O3. Only 20 times faster than CPython is not enough!

(In my JPEG timings I posted earlier today, CPython was 175 times slower
than gcc-O3. and 48-64 times slower than unoptimised C.

Applying the simplest optimsation, which I can tell you adds only 10% to compilation time) made native code over 100 times faster than CPython,
and only 50 slower than gcc-O3. This was on a deliberately large input

Basically, if you are generating even the worst native code, then it
will already wipe the floor with any scripting language, when comparing
them both executing the same algorithm.)

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Mon Nov 25 06:35:37 2024

Bart <bc@freeuk.com> wrote:

On 24/11/2024 15:00, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

On 24/11/2024 05:03, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

As for sizes:

c:\c>dir hello.exe
24/11/2024 00:44 2,048 hello.exe

c:\c>dir a.exe
24/11/2024 00:44 91,635 a.exe (48K with -s)

(At least that's one good thing of gcc writing out that weird a.exe each >>>>> time; I can compare both exes!)

AFAICS this is one-time Windows overhead + default layout rules for
the linker. On Linux I get 15952 bytes by defauls, 14472 after
striping. However, the actual code + data size is 1904 and even
in this most is crap needed to support extra features of C library.

In other words, this is mostly irrelevant, as people who want to
get size down can link it with different options to get smaller
size down. Actual hello world code size is 99 bytes when compiled
by gcc (default options) and 64 bytes by tcc.

I get a size of 3KB for tcc compiling hello.c under WSL.

That more or less agrees with file size that I reported. I
prefer to look at what 'size' reports and at looking at .o
files,

Oh, I thought you were reporting sizes of 99 and 64 bytes, in response
to tcc's 2048 bytes.

So I'm not sure what you mean by 'actual' size, unless it is the same as this reported by my product here (comments added):

c:\cx>cc -v hello
Compiling hello.c to hello.exe
Code size: 34 bytes # .text
Idata size: 15 # .data
Code+Idata: 49
Zdata size: 0 # .bss
EXE size: 2,560

So at 49 bytes, I guess I win!

It looks so. Yes, I mean code + data size, if you have multiple
functions this adds up, while constant overhead remains constant.
On linux each program is supposed to have a header and that
puts absolute lower bound on size of the program (no neader =>
OS considers it as invalid). In modern programs you are
supposed to have separate code area, read-only data area and
mutable data area. In running program each of them consists
of integral number of pages. If you arrange them so that OS
can load them most easily, you get something like 12kB or 16kB
(actually a bit smaller as normally file will not contain
unused part of the last page). But if you add more code or
data the size will grow only sligthly or not at all: you
will see growth on last page and when one of inner pages
overflows and you need to start a new page.

It can also do manipulations that are harder in a 'softer', safer HLL.
(My scripting language however can still do most of those underhand things.)

Anything computational can be done in a HLL. You may wish to
play tricks to save time. Or possible some packing tricks to
save memory. But packing tricks can be done in HLL (say by
treating whole memory as a big array of u64), so this really
boils down to speed.

I'm sure that with Python, say, pretty much anything can be done given enough effort. Even if it means cheating by using external add-on
modules to get around language limitations, like using Ctypes module,
which you will likely find uses C code.

I did not look how Python is doing its things. In one system that
I use there is rather general routine written in assembler which
can call routines using C call convention. The assembler routine
performs simple data convertion like removing tags so that C
sees raw machine integers or floats. It also knows which arguments
are supposed to go on the stack and which should be in registers.
There is less complete routine which allows callbacks from C,
this one abuses C (it is invalid C which happens to work OK in
all C compilers used to compile the system). There is a bunch
other assembler support routines, like access to arbitrary
bitstring, byte copy (used to copy arrays when needed), etc.

Rest is in the language itself: code generator knows about
references to bitstrings and in simple cases generates inline
code and passes general case to assembler support. There are
language defined data structures to represent external pointers
and functions. At higher level there is parser for C
declarations which can generate code to repack data structure
from C version to internal and back.

Concerning cheating, of course Python is cheating a lot. It has
several routines which work on sizeable pieces of data. Those
routines are coded in C or C++, so you get optimized C speed
when you call them.

This is different from having things part of the core language so they become effortless and natural.

But, everything you've said seems to have backed up my remark that
people only seem to consider two possibilities:

* Either a scripting language where it doesn't matter that it's 1-2 magnitudes slower than native code

* Or a compiled language where it absolutely MUST be at least as fast as gcc/clang-O3. Only 20 times faster than CPython is not enough!

You ignored what I wrote about compiled higher-level languages,
they exist, have speed competitive with your low-level language
and some people use them. Majority seem to go with interpreted
languages. Note that interpreted languages frequently have
large library of performance critical routines written in
lower level language. Do not be surprised that they want
optimizing compiler for them.

(In my JPEG timings I posted earlier today, CPython was 175 times slower than gcc-O3. and 48-64 times slower than unoptimised C.

Applying the simplest optimsation, which I can tell you adds only 10% to compilation time) made native code over 100 times faster than CPython,
and only 50 slower than gcc-O3. This was on a deliberately large input

Basically, if you are generating even the worst native code, then it
will already wipe the floor with any scripting language, when comparing
them both executing the same algorithm.)

But competion in not fair, the other side is cheating. Note
that using low-level language coding effort will be comparable
to C. You may save some time if you get better diagnostics.
There were studies claiming that stronger type checking reduces
effort needed to write correct program. But main increase in
productivity comes from higher-level constructs. Actually,
probably bigest gain is when you can reuse existing code, which
means to popular languages have very big advantage over less
popular ones. You need to rather strong advantages to
overcome popularity advantage of other language. Faster
compilation while nice have limited effect. And people have
ways to mitigate long compile times. So normal justification
for using low level language is "I need runtime speed". And
in such case it is natural to use compiler giving fastest
runtime speed.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Keith Thompson@3:633/280.2 to All on Mon Nov 25 07:01:43 2024

Bart <bc@freeuk.com> writes:
[...]

Most of a gcc installation is hundreds of header and archive (.a)
files for various libraries. There might be 32-bit and 64-bit
versions. I understand that. But it also makes it hard to isolate the
core compiler.

[...]

That doesn't agree with my observations.

Of course most of the headers and libraries are not part of gcc itself.
As usual, you refer to the entire implementation as "gcc".

I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5,
installing each into a new directory.

The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I strip
the executables.

The glibc installation (libraries and headers) is about 199 MB, a small fraction of the size of the gcc intallation.

Of course there are other libraries that can be used with gcc, and they
could take a lot of space -- but they're not part of gcc.

These sizes might differ on Windows.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: None to speak of (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Mon Nov 25 07:52:54 2024

On 24/11/2024 20:01, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:
[...]

Most of a gcc installation is hundreds of header and archive (.a)
files for various libraries. There might be 32-bit and 64-bit
versions. I understand that. But it also makes it hard to isolate the
core compiler.

[...]

That doesn't agree with my observations.

Of course most of the headers and libraries are not part of gcc itself.
As usual, you refer to the entire implementation as "gcc".

I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5, installing each into a new directory.

The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I strip
the executables.

That's even huger than mine! So, that are those 3.7GB full of? What does
the 1.9GB of executables do?

The glibc installation (libraries and headers) is about 199 MB, a small fraction of the size of the gcc intallation.

Is that included in one of those two divisions above?

Of course there are other libraries that can be used with gcc, and they
could take a lot of space -- but they're not part of gcc.

So, what /is/ gcc? What's the minimum installation that can compile
hello.c to hello.s for example?

I've done that experiment on my TDM version, and the answer appears to
be about 40MB in this directory structure:

Directory of c:\tdm\bin
24/07/2024 10:21 1,926,670 gcc.exe
24/07/2024 10:21 2,279,503 libisl-23.dll
24/07/2024 10:22 164,512 libmpc-3.dll
24/07/2024 10:22 702,852 libmpfr-6.dll

Directory of c:\tdm\libexec\gcc\x86_64-w64-mingw32\14.1.0
24/07/2024 10:24 34,224,654 cc1.exe

Directory of c:\tdm\x86_64-w64-mingw32\include
17/01/2021 17:33 368 stddef.h
27/03/2021 20:07 2,924 stdio.h

7 File(s) 39,301,483 bytes

Here I cheated a little and used the minimum std headers from my
compiler, otherwise I could have spent an hour chasing down dozens of
obscure nested headers that gcc's stdio.h likes to make use of.

Is /this/ gcc then? Will you agree that it is by no means clear what
'gcc' includes, or what to call the part of a gcc installed bundle that
is not technically gcc?

A more useful installation would of course need more standard headers,
an assembler, linker, and whatever .a files are needed to provide the
standard library.

With clang, it is easier: apparently everything needed to do the above,
other than header files, is contained with a 120MB executable clang.exe.

However the full 2.8GB llvm/clang installation doesn't provide any
headers, nor a linker. At least it doesn't use the provided 88MB (!)
lld.exe; it expects to work on top of MSVC, which it has never managed
to do.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Mon Nov 25 08:45:39 2024

Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

Bart <bc@freeuk.com> writes:
[...]

Most of a gcc installation is hundreds of header and archive (.a)
files for various libraries. There might be 32-bit and 64-bit
versions. I understand that. But it also makes it hard to isolate the
core compiler.

[...]

That doesn't agree with my observations.

Of course most of the headers and libraries are not part of gcc itself.
As usual, you refer to the entire implementation as "gcc".

I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5, installing each into a new directory.

The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I strip
the executables.

That is much larger than what I got. On Debian 12.7 I used
'--disable-multilib --enable-languages=c,c++,objc,obj-c++,fortran,ada,m2,go'.

IIRC it was something like 2.4G originally and 1012176k after
striping. AFAICS with earlier versions ARM compiler was much
bigger than x86_64 one, mainly because ARM had libraries for
several variants of the architecture. Header files are not
that big (but still several megabytes), but libraries seem to
be quite large (I did not check, but it is possible that
libraries still contain debug info).

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Keith Thompson@3:633/280.2 to All on Mon Nov 25 08:45:55 2024

Bart <bc@freeuk.com> writes:

On 24/11/2024 20:01, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:
[...]

Most of a gcc installation is hundreds of header and archive (.a)
files for various libraries. There might be 32-bit and 64-bit
versions. I understand that. But it also makes it hard to isolate the
core compiler.

[...]
That doesn't agree with my observations.
Of course most of the headers and libraries are not part of gcc
itself.
As usual, you refer to the entire implementation as "gcc".
I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5,
installing each into a new directory.
The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I
strip
the executables.

That's even huger than mine! So, that are those 3.7GB full of? What
does the 1.9GB of executables do?

I installed compilers for multiple languages. A more typical
installation likely won't include compilers for Ada, Go, Fortran,
Modula-2, and Rust. There are a number of hard links to other files;
for example c++, g++, x86_64-pc-linux-gnu-c++, and
x86_64-pc-linux-gnu-g++ are all the same file. Apparently `du` is
clever enough to count them only once.

Here's the output of `ls -s` on the bin directory (sizes are in units of
1024 bytes) :

total 611908
8828 c++ 8960 gm2 8828 x86_64-pc-linux-gnu-c++
8820 cpp 8264 gnat 8828 x86_64-pc-linux-gnu-g++
8828 g++ 13092 gnatbind 8820 x86_64-pc-linux-gnu-gcc
8820 gcc 9556 gnatchop 8820 x86_64-pc-linux-gnu-gcc-14.2.0
156 gcc-ar 12564 gnatclean 156 x86_64-pc-linux-gnu-gcc-ar
156 gcc-nm 7864 gnatkr 156 x86_64-pc-linux-gnu-gcc-nm
152 gcc-ranlib 8564 gnatlink 152 x86_64-pc-linux-gnu-gcc-ranlib
8828 gccgo 12764 gnatls 8828 x86_64-pc-linux-gnu-gccgo
8820 gccrs 13584 gnatmake 8820 x86_64-pc-linux-gnu-gccrs
7784 gcov 12236 gnatname 8828 x86_64-pc-linux-gnu-gdc
6324 gcov-dump 12308 gnatprep 8824 x86_64-pc-linux-gnu-gfortran
6468 gcov-tool 11136 go 8960 x86_64-pc-linux-gnu-gm2
8828 gdc 620 gofmt
8824 gfortran 308740 lto-dump

The glibc installation (libraries and headers) is about 199 MB, a small
fraction of the size of the gcc intallation.

Is that included in one of those two divisions above?

Of course not. glibc is not part of gcc.

Of course there are other libraries that can be used with gcc, and they
could take a lot of space -- but they're not part of gcc.

So, what /is/ gcc? What's the minimum installation that can compile
hello.c to hello.s for example?

Those are two separate questions. gcc by itself can't compile hello.c
to hello.s. But it's always installed along with other tools that allow
it to do so, as part of what the C standard calls an "implementation".

You can't compile hello.c to hello.s without an OS kernel, but I presume
you'd agree that the kernel isn't part of gcc. And hello.s isn't useful without an assembler, which is not treated as part of gcc.

gcc is a compiler, or rather a compiler collection. (The "gcc" command
is the C compiler component of the "gcc" compiler collection.) Since
gcc does not provide <stdio.h>, I presume that a standalone gcc would
not be able to compile hello.c without depending on a library, whether
that library is installed separately or as part of a package like
tdm-gcc (there's nothing wrong with either approach).

I should also acknowledge that the "gcc" package, whether it's provided
as source code or as binaries, provides some files that are not part of
the compiler itself, for example library files that are closely tied to
the compiler. Installable software packages don't have to follow any particular division between compiler, library, and other components.

When I install gcc, binutils, and glibc from the Ubuntu package manager,
the binaries are installed in common directories (/usr/bin, /usr/lib, et
al). There's no "gcc directory" or "glibc directory". But the system
keeps track of which files were install from which packages.

Perhaps you don't care what is or isn't part of "gcc". If that's the
case, that's fine, but it would help if you'd stop referring to things
as "gcc" without knowing what that means. You're using "gcc-tdm"; just
call it that.

I've done that experiment on my TDM version, and the answer appears to
be about 40MB in this directory structure:

Directory of c:\tdm\bin
24/07/2024 10:21 1,926,670 gcc.exe
24/07/2024 10:21 2,279,503 libisl-23.dll
24/07/2024 10:22 164,512 libmpc-3.dll
24/07/2024 10:22 702,852 libmpfr-6.dll

Directory of c:\tdm\libexec\gcc\x86_64-w64-mingw32\14.1.0
24/07/2024 10:24 34,224,654 cc1.exe

Directory of c:\tdm\x86_64-w64-mingw32\include
17/01/2021 17:33 368 stddef.h
27/03/2021 20:07 2,924 stdio.h

7 File(s) 39,301,483 bytes

Here I cheated a little and used the minimum std headers from my
compiler, otherwise I could have spent an hour chasing down dozens of
obscure nested headers that gcc's stdio.h likes to make use of.

Is /this/ gcc then? Will you agree that it is by no means clear what
'gcc' includes, or what to call the part of a gcc installed bundle
that is not technically gcc?

It's not entirely clear, but it's much clearer than you make it out to
be.

One thing that should be obvious by now is that stdio.h is not part of
"gcc", though it's probably part of "gcc-tdm". On my system, stddef.h
is provided by libgcc-11-dev, which is closely associated with gcc. I'm
not entirely sure why gcc-11 and libgcc-11-dev (the Ubuntu binary
packages) are separate -- nor do I have to care, since the package
management system is clever enough to recognize the dependencies and
keep them in sync.

A more useful installation would of course need more standard headers,
an assembler, linker, and whatever .a files are needed to provide the standard library.

Sure, those are all part of a C implementation, though they're not part
of gcc.

With clang, it is easier: apparently everything needed to do the
above, other than header files, is contained with a 120MB executable clang.exe.

That may be true for the "clang.exe" on your system. I'm fairly sure
it's not true for the "/usr/bin/clang" on my system. Perhaps you
installed some Windows package that provides the clang compiler and
other components of a C implementation, similar to the way gcc-tdm
provides gcc and other components.

However the full 2.8GB llvm/clang installation doesn't provide any
headers, nor a linker. At least it doesn't use the provided 88MB (!)
lld.exe; it expects to work on top of MSVC, which it has never managed
to do.

I suspect others have managed it, but I haven't tried (I don't use
llvm/clang on Windows other than via Cygwin and WSL). But apparently MS
Visual Studio can be configured to use clang as its compiler.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: None to speak of (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Mon Nov 25 09:21:59 2024

Bart <bc@freeuk.com> wrote:

On 24/11/2024 20:01, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:
[...]

Most of a gcc installation is hundreds of header and archive (.a)
files for various libraries. There might be 32-bit and 64-bit
versions. I understand that. But it also makes it hard to isolate the
core compiler.

[...]

That doesn't agree with my observations.

Of course most of the headers and libraries are not part of gcc itself.
As usual, you refer to the entire implementation as "gcc".

I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5,
installing each into a new directory.

The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I strip
the executables.

That's even huger than mine! So, that are those 3.7GB full of? What does
the 1.9GB of executables do?

The 3.7GB is debug info which Keith removed. gcc is now written in
C++ and when you compile with debug info on about 90% of executable
is debug info.

Of course there are other libraries that can be used with gcc, and they
could take a lot of space -- but they're not part of gcc.

So, what /is/ gcc? What's the minimum installation that can compile
hello.c to hello.s for example?

I've done that experiment on my TDM version, and the answer appears to
be about 40MB in this directory structure:

Directory of c:\tdm\bin
24/07/2024 10:21 1,926,670 gcc.exe
24/07/2024 10:21 2,279,503 libisl-23.dll
24/07/2024 10:22 164,512 libmpc-3.dll
24/07/2024 10:22 702,852 libmpfr-6.dll

Directory of c:\tdm\libexec\gcc\x86_64-w64-mingw32\14.1.0
24/07/2024 10:24 34,224,654 cc1.exe

That is reasonably good apporximation to the compiler proper.
More preciesly, to compile you need 'cc1.exe' and libraries
it uses. On Linux I get:
ldd /sklad0/p0/kompi/gcc_pp/usr_14.2.0/libexec/gcc/x86_64-pc-linux-gnu/14.2.0/cc1
linux-vdso.so.1 (0x00007ffc8a8f2000)
libmpc.so.3 => /lib/x86_64-linux-gnu/libmpc.so.3 (0x00007fa55e071000)
libmpfr.so.6 => /lib/x86_64-linux-gnu/libmpfr.so.6 (0x00007fa55dfb7000)
libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x00007fa55df36000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa55de57000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa55dc76000)
/lib64/ld-linux-x86-64.so.2 (0x00007fa55e0a9000)

This is list of libraries needed by 'cc1'. /lib64/ld-linux-x86-64.so.2, libc.so.6 and libm.so.6 are system libraries needed by almost all
things. linux-vdso.so.1 is virtual thing, IIUC there is nothing
corresponding on the disc.

Directory of c:\tdm\x86_64-w64-mingw32\include
17/01/2021 17:33 368 stddef.h
27/03/2021 20:07 2,924 stdio.h

7 File(s) 39,301,483 bytes

Here I cheated a little and used the minimum std headers from my
compiler, otherwise I could have spent an hour chasing down dozens of obscure nested headers that gcc's stdio.h likes to make use of.

Yes, beside compiler propor you also need headers used by the C
file.

Is /this/ gcc then? Will you agree that it is by no means clear what
'gcc' includes, or what to call the part of a gcc installed bundle that
is not technically gcc?

A more useful installation would of course need more standard headers,
an assembler, linker, and whatever .a files are needed to provide the standard library.

Debian splits gcc into several package. One of them is 'cpp-12'
and this one gives you 'cc1' (that is compiler proper). There
is 'gcc-12' which actually mainly provides extra features like
lto (link time optimization), sanitizers. It also provides
things like 'collect2' (wrapper around linker to have extra
features) and 'x86_64-linux-gnu-gcc-ar-12' (I do not know why
this is needed). 'gcc-12' pulls several dependencies:

cpp-12, gcc-12-base, libcc1-0, binutils, libgcc-12-dev,
libc6, libgcc-s1, libgmp10, libisl23, libmpc3, libmpfr6,
libstdc++6, libzstd1, zlib1g

binutils gives you assembler and linker, libgcc-s1 is shared
support library (needed to run dynamically linked programs),
libgcc-12-dev contains startup files (needed to link any program)
and bunch of libraries and headers supporting extra features,
libgmp10, libmpc3, libmpfr6 (and of course libc6) are needed
to run compiler. I am not sure about libisl23, libstdc++6,
libzstd1, zlib1g.

To get standard header files you need to install 'libc6-dev'.

With clang, it is easier: apparently everything needed to do the above, other than header files, is contained with a 120MB executable clang.exe.

Probably you means things needed to run the compiler. clang compiled executable need libraries too, on Debian this is shared with gcc.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Mon Nov 25 11:00:59 2024

On 24/11/2024 21:45, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

On 24/11/2024 20:01, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:
[...]

Most of a gcc installation is hundreds of header and archive (.a)
files for various libraries. There might be 32-bit and 64-bit
versions. I understand that. But it also makes it hard to isolate the
core compiler.

[...]
That doesn't agree with my observations.
Of course most of the headers and libraries are not part of gcc
itself.
As usual, you refer to the entire implementation as "gcc".
I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5,
installing each into a new directory.
The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I
strip
the executables.

That's even huger than mine! So, that are those 3.7GB full of? What
does the 1.9GB of executables do?

I installed compilers for multiple languages. A more typical
installation likely won't include compilers for Ada, Go, Fortran,
Modula-2, and Rust. There are a number of hard links to other files;
for example c++, g++, x86_64-pc-linux-gnu-c++, and
x86_64-pc-linux-gnu-g++ are all the same file. Apparently `du` is
clever enough to count them only once.

Here's the output of `ls -s` on the bin directory (sizes are in units of
1024 bytes) :

total 611908
8828 c++ 8960 gm2 8828 x86_64-pc-linux-gnu-c++
8820 cpp 8264 gnat 8828 x86_64-pc-linux-gnu-g++
8828 g++ 13092 gnatbind 8820 x86_64-pc-linux-gnu-gcc
8820 gcc 9556 gnatchop 8820 x86_64-pc-linux-gnu-gcc-14.2.0
156 gcc-ar 12564 gnatclean 156 x86_64-pc-linux-gnu-gcc-ar
156 gcc-nm 7864 gnatkr 156 x86_64-pc-linux-gnu-gcc-nm
152 gcc-ranlib 8564 gnatlink 152 x86_64-pc-linux-gnu-gcc-ranlib
8828 gccgo 12764 gnatls 8828 x86_64-pc-linux-gnu-gccgo
8820 gccrs 13584 gnatmake 8820 x86_64-pc-linux-gnu-gccrs
7784 gcov 12236 gnatname 8828 x86_64-pc-linux-gnu-gdc
6324 gcov-dump 12308 gnatprep 8824 x86_64-pc-linux-gnu-gfortran
6468 gcov-tool 11136 go 8960 x86_64-pc-linux-gnu-gm2
8828 gdc 620 gofmt
8824 gfortran 308740 lto-dump

The glibc installation (libraries and headers) is about 199 MB, a small
fraction of the size of the gcc intallation.

Is that included in one of those two divisions above?

Of course not. glibc is not part of gcc.

Of course there are other libraries that can be used with gcc, and they
could take a lot of space -- but they're not part of gcc.

So, what /is/ gcc? What's the minimum installation that can compile
hello.c to hello.s for example?

Those are two separate questions. gcc by itself can't compile hello.c
to hello.s. But it's always installed along with other tools that allow
it to do so, as part of what the C standard calls an "implementation".

You can't compile hello.c to hello.s without an OS kernel, but I presume you'd agree that the kernel isn't part of gcc. And hello.s isn't useful without an assembler, which is not treated as part of gcc.

gcc is a compiler, or rather a compiler collection. (The "gcc" command
is the C compiler component of the "gcc" compiler collection.) Since
gcc does not provide <stdio.h>, I presume that a standalone gcc would
not be able to compile hello.c without depending on a library, whether
that library is installed separately or as part of a package like
tdm-gcc (there's nothing wrong with either approach).

I should also acknowledge that the "gcc" package, whether it's provided
as source code or as binaries, provides some files that are not part of
the compiler itself, for example library files that are closely tied to
the compiler. Installable software packages don't have to follow any particular division between compiler, library, and other components.

When I install gcc, binutils, and glibc from the Ubuntu package manager,
the binaries are installed in common directories (/usr/bin, /usr/lib, et
al). There's no "gcc directory" or "glibc directory". But the system
keeps track of which files were install from which packages.

Perhaps you don't care what is or isn't part of "gcc". If that's the
case, that's fine, but it would help if you'd stop referring to things
as "gcc" without knowing what that means. You're using "gcc-tdm"; just
call it that.

I've done that experiment on my TDM version, and the answer appears to
be about 40MB in this directory structure:

Directory of c:\tdm\bin
24/07/2024 10:21 1,926,670 gcc.exe
24/07/2024 10:21 2,279,503 libisl-23.dll
24/07/2024 10:22 164,512 libmpc-3.dll
24/07/2024 10:22 702,852 libmpfr-6.dll

Directory of c:\tdm\libexec\gcc\x86_64-w64-mingw32\14.1.0
24/07/2024 10:24 34,224,654 cc1.exe

Directory of c:\tdm\x86_64-w64-mingw32\include
17/01/2021 17:33 368 stddef.h
27/03/2021 20:07 2,924 stdio.h

7 File(s) 39,301,483 bytes

Here I cheated a little and used the minimum std headers from my
compiler, otherwise I could have spent an hour chasing down dozens of
obscure nested headers that gcc's stdio.h likes to make use of.

Is /this/ gcc then? Will you agree that it is by no means clear what
'gcc' includes, or what to call the part of a gcc installed bundle
that is not technically gcc?

It's not entirely clear, but it's much clearer than you make it out to
be.

One thing that should be obvious by now is that stdio.h is not part of
"gcc", though it's probably part of "gcc-tdm". On my system, stddef.h
is provided by libgcc-11-dev, which is closely associated with gcc. I'm
not entirely sure why gcc-11 and libgcc-11-dev (the Ubuntu binary
packages) are separate -- nor do I have to care, since the package
management system is clever enough to recognize the dependencies and
keep them in sync.

A more useful installation would of course need more standard headers,
an assembler, linker, and whatever .a files are needed to provide the
standard library.

Sure, those are all part of a C implementation, though they're not part
of gcc.

This seems to be a thing with Linux, where a big chunk of a C
implementation is provided by the OS.

That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities. On Windows, C compilers tend to be self-contained (except for
Clang which appears to be parasitical: it used to piggy-back onto gcc,
then it switched to MSVC).

I'm not sure what the utility to compile C programs is called, if it is
not 'gcc'. But this is a C group, I would expect people to know it is a
C compiler, or the front end of one.

However I use 'gcc' in other forums and everyone knows what I mean.

What do /you/ call the C compiler that is invoked by gcc?

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Mon Nov 25 11:19:14 2024

On 24/11/2024 22:21, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

With clang, it is easier: apparently everything needed to do the above,
other than header files, is contained with a 120MB executable clang.exe.

Probably you means things needed to run the compiler. clang compiled executable need libraries too, on Debian this is shared with gcc.

No, this was a standalone 119MB clang.exe. I had to give it a tweaked
hello.c without stdio.h, and it produced only hello.s.

My cc.exe is 1/400th the size (99.75% smaller) and it can convert
hello.c (/with/ stdio.h) to hello.exe, or any of half-dozen options
within the same package (eg. interpret or run).

(cc.exe concentrates on single-file programs. To compile multi-module programs, it needs a 200-line script, and an extra 0.1MB utility, an assembler-linker. Then outputs are limited to EXE/DLL/OBJ/MX.

Most of my needs for a C compiler however are for programs contained
within one file.)

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Keith Thompson@3:633/280.2 to All on Mon Nov 25 14:17:11 2024

Bart <bc@freeuk.com> writes:

On 24/11/2024 21:45, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

[...]

A more useful installation would of course need more standard headers,
an assembler, linker, and whatever .a files are needed to provide the
standard library.

Sure, those are all part of a C implementation, though they're not
part of gcc.

This seems to be a thing with Linux, where a big chunk of a C
implementation is provided by the OS.

I'm not sure what you mean by "provided by the OS". Linux-based
systems tend to be very modular, with almost everything provided by
some installable binary package. Some of those packages have to
be provided by default, for example any dynamic libraries relied
on by most executables. Files that are needed for development,
such as header files, compilers, and associated tools such as
assemblers and linkers, may be optional.

That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities.

On my system (Ubuntu), the as and ld commands are provided by the
binutils package ("binutils-x86-64-linux-gnu"). Some distributions
may install these by default. Others do not, but they're easy
to install.

On Windows, C compilers tend to be self-contained (except
for Clang which appears to be parasitical: it used to piggy-back onto
gcc, then it switched to MSVC).

I don't know what you mean by "piggy-back onto gcc".

I'm not sure what the utility to compile C programs is called, if it
is not 'gcc'. But this is a C group, I would expect people to know it
is a C compiler, or the front end of one.

However I use 'gcc' in other forums and everyone knows what I mean.

What do /you/ call the C compiler that is invoked by gcc?

I call it gcc.

"gcc" is the name for several things. It's the "GNU Compiler
Collection". It's the command invoked as the driver for any of
several compilers that are part of the GNU Compiler Collection.
It can refer specifically to the C compiler. It's mildly confusing
for historical reasons, but most people don't have much of a
problem with it, and don't pretend that it's more confusing than
it really is.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: None to speak of (3:633/280.2@fidonet)

From Michael S@3:633/280.2 to All on Mon Nov 25 20:30:46 2024

On Sun, 24 Nov 2024 13:45:55 -0800
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

Bart <bc@freeuk.com> writes:

On 24/11/2024 20:01, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:
[...]

Most of a gcc installation is hundreds of header and archive (.a)
files for various libraries. There might be 32-bit and 64-bit
versions. I understand that. But it also makes it hard to isolate
the core compiler.

[...]
That doesn't agree with my observations.
Of course most of the headers and libraries are not part of gcc
itself.
As usual, you refer to the entire implementation as "gcc".
I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5,
installing each into a new directory.
The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I
strip
the executables.

That's even huger than mine! So, that are those 3.7GB full of? What
does the 1.9GB of executables do?

I installed compilers for multiple languages. A more typical
installation likely won't include compilers for Ada, Go, Fortran,
Modula-2, and Rust. There are a number of hard links to other files;
for example c++, g++, x86_64-pc-linux-gnu-c++, and
x86_64-pc-linux-gnu-g++ are all the same file. Apparently `du` is
clever enough to count them only once.

Here's the output of `ls -s` on the bin directory (sizes are in units
of 1024 bytes) :

total 611908
8828 c++ 8960 gm2 8828 x86_64-pc-linux-gnu-c++
8820 cpp 8264 gnat 8828 x86_64-pc-linux-gnu-g++
8828 g++ 13092 gnatbind 8820 x86_64-pc-linux-gnu-gcc
8820 gcc 9556 gnatchop 8820
x86_64-pc-linux-gnu-gcc-14.2.0 156 gcc-ar 12564 gnatclean
156 x86_64-pc-linux-gnu-gcc-ar 156 gcc-nm 7864 gnatkr
156 x86_64-pc-linux-gnu-gcc-nm 152 gcc-ranlib 8564 gnatlink
152 x86_64-pc-linux-gnu-gcc-ranlib 8828 gccgo 12764 gnatls
8828 x86_64-pc-linux-gnu-gccgo 8820 gccrs 13584 gnatmake
8820 x86_64-pc-linux-gnu-gccrs 7784 gcov 12236 gnatname
8828 x86_64-pc-linux-gnu-gdc 6324 gcov-dump 12308 gnatprep
8824 x86_64-pc-linux-gnu-gfortran 6468 gcov-tool 11136 go
8960 x86_64-pc-linux-gnu-gm2 8828 gdc 620 gofmt
8824 gfortran 308740 lto-dump

67% of bin directory of i386 gcc13 compiler that I compiled from source
on msys2 few months ago is a single huge executive:i386-elf-lto-dump.exe 410,230,002 bytes with symbols, 28,347,904 bytes stripped.
Copying such file is not instant, even on SSD. Certainly takes time
over internet.

It does not look like I have any use for it, stripped or not. When I
want dump, I use smaller utility, i386-elf-objdump.exe (14,740,647
bytes with symbols, 2,242,048 bytes stripped) that already does more
than I would know to use.

Arm gcc12 compiler for small emebedded targets (arm-none-eabi-gcc) in
the same msys2 environment that I did not compile from source also
contains arm-none-eabi-lto-dump.exe and it is also the biggest exe by
far, but at least it is stripped and only 23,728,128

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Mon Nov 25 22:21:23 2024

Bart <bc@freeuk.com> wrote:

This seems to be a thing with Linux, where a big chunk of a C
implementation is provided by the OS.

That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities. On Windows, C compilers tend to be self-contained (except for Clang which appears to be parasitical: it used to piggy-back onto gcc,
then it switched to MSVC).

You know that at source level there are separate projects: gcc proper,
binutils and libc. libc provides C library, however header should
be matched to the library, so libc also provides headers.

Linux has distributions, which beside bare OS provide a lot of packages.
Binary C library is used by almost all programs so is provided even
in minimal install. Linux has package managers, so everyting you
install may be split into small packages, but for user it is just
knowing few crucial names, package manager will install all
dependencies.

AFAIK Windows alone does not have a package manager and you apparently
reject package managers provided by third parties. So the only
viable approach is to install big bundle ("self-contained compiler").
There is also commercial aspect: even if this is free download
commercial entity normally does not want to pass "sales" to other
parties. OTOH open source project cooperate and acknowledge
existence of other projects.

I'm not sure what the utility to compile C programs is called, if it is
not 'gcc'. But this is a C group, I would expect people to know it is a
C compiler, or the front end of one.

However I use 'gcc' in other forums and everyone knows what I mean.

What do /you/ call the C compiler that is invoked by gcc?

If you want to be technical you could say 'cc1'. But usually
people know what you mean when you say 'gcc'.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Mon Nov 25 22:35:20 2024

On 25/11/2024 03:17, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

On 24/11/2024 21:45, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

[...]

A more useful installation would of course need more standard headers, >>>> an assembler, linker, and whatever .a files are needed to provide the
standard library.

Sure, those are all part of a C implementation, though they're not
part of gcc.

This seems to be a thing with Linux, where a big chunk of a C
implementation is provided by the OS.

I'm not sure what you mean by "provided by the OS". Linux-based
systems tend to be very modular, with almost everything provided by
some installable binary package. Some of those packages have to
be provided by default, for example any dynamic libraries relied
on by most executables. Files that are needed for development,
such as header files, compilers, and associated tools such as
assemblers and linkers, may be optional.

Well, does a C compiler for Linux come with its own stdio.h, or does it
share /usr/include/stdio.h along with other compilers?

C compilers for Windows tend to be self-contained. Except for clang (see below). So each has its own stdio.h.

The only thing the OS provides is msvcrt.dll, a library of C standard functions, one which probably started out for internal use but too many programs now rely on it.

That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities.

On my system (Ubuntu), the as and ld commands are provided by the
binutils package ("binutils-x86-64-linux-gnu"). Some distributions
may install these by default. Others do not, but they're easy
to install.

On Windows, C compilers tend to be self-contained (except
for Clang which appears to be parasitical: it used to piggy-back onto
gcc, then it switched to MSVC).

I don't know what you mean by "piggy-back onto gcc".

It relies for things like header files, linkers and libraries on an
existing gcc installation.

I used clang for 18 months before I realised this.

Then they changed over to relying on MSVC for those facilities. This is
when it started having trouble finding and syncing to MSVC, even when I
had a working CL compiler.

I'm not sure what the utility to compile C programs is called, if it
is not 'gcc'. But this is a C group, I would expect people to know it
is a C compiler, or the front end of one.

However I use 'gcc' in other forums and everyone knows what I mean.

What do /you/ call the C compiler that is invoked by gcc?

I call it gcc.

"gcc" is the name for several things. It's the "GNU Compiler
Collection". It's the command invoked as the driver for any of
several compilers that are part of the GNU Compiler Collection.
It can refer specifically to the C compiler. It's mildly confusing
for historical reasons, but most people don't have much of a
problem with it, and don't pretend that it's more confusing than
it really is.

But you seem to like pointing out that gcc doesn't include header files, assemblers, linkers and libraries. And previously you claimed that:

"gcc by itself can't compile hello.c to hello.s."

So, what does it need? Is your point that it invokes a separate program
like 'cc1.exe'? (Plus those 3 other binaries I listed in the case of tdm.)

You also said:

"You can't compile hello.c to hello.s without an OS kernel"

I guess you mean that it needs on OS to provide a file system, means to
launch an executable like gcc.exe in the first place, and a display for messages?

That's a rather silly one. (However my first compiler did run on bare
metal!)

And hello.s isn't useful without an assembler, which is not treated

as part of gcc

I deliberately stopped at the assembly file because I knew you would
leap at that.

I assume that turning a .c file into .s/.asm is the very definition of
what a baseline C compiler is expected to do.

It seems you are still disputing this and casting confusing. C compilers
for Windows such as lccwin32, Pelles C, DMC, tcc, my mcc/cc are all self-contained and can turn hello.c all the way to hello.exe.

It is gcc that is always the exception, in every way (like generating
a.exe files by default, or thinking that HELLO.C must be a C++ file).

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Mon Nov 25 23:06:36 2024

On 24/11/2024 21:01, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:
[...]

Most of a gcc installation is hundreds of header and archive (.a)
files for various libraries. There might be 32-bit and 64-bit
versions. I understand that. But it also makes it hard to isolate the
core compiler.

[...]

That doesn't agree with my observations.

Of course most of the headers and libraries are not part of gcc itself.
As usual, you refer to the entire implementation as "gcc".

I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5, installing each into a new directory.

The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I strip
the executables.

That sounds like a /very/ large size. A quick check of the pre-build
Debian package for gcc-14 is about 90 MB installed. (That is for the C compiler - not binutils, or libraries.) C++ adds another 50% to that.
Are you including the build directories with all the object files too?

For a full gcc-based toolchain, I have lots of these for
cross-compilation, each in individual directories. (Contrary to Bart's imagination, this is entirely possible - even on Windows. All it needs
is appropriate configuration when building the toolchain.) A typical
ARM toolchain is about 1 GB or so, including all the libraries, headers, debuggers, C and C++ support, binutils, gdb, documentation, and so on.
Of that, maybe about 250 MB of that is executable files and 650 MB is pre-build libraries optimised for 20+ device families.

The glibc installation (libraries and headers) is about 199 MB, a small fraction of the size of the gcc intallation.

Of course there are other libraries that can be used with gcc, and they
could take a lot of space -- but they're not part of gcc.

These sizes might differ on Windows.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Mon Nov 25 23:17:13 2024

On 25/11/2024 11:21, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

This seems to be a thing with Linux, where a big chunk of a C
implementation is provided by the OS.

That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities. On Windows, C compilers tend to be self-contained (except for
Clang which appears to be parasitical: it used to piggy-back onto gcc,
then it switched to MSVC).

You know that at source level there are separate projects: gcc proper, binutils and libc.

Actually, no I don't. I said more on this in my reply to Keith a short
while ago.

My experience of C compilers on Windows is that they provide a means to
turn .c files into executable files. Such a compiler on Windows
generally has to be self-contained, since very little is provided by the OS.

How the source code is structured, or how it's organised internally, is
of little concern to me. My source code for cc.exe is also structured
into different components, but I don't expect users to know or care
about that.

Those terms are simply how Linux (and Unix I guess) has decided a C
compiler should be organised.

So from my point of view, gcc is the outlier.

(See: https://github.com/sal55/langs/blob/master/CompilerSuite.md

This describes my current set of tools. Each .exe file is
self-contained; no other program and no other file is needed to get from
the input to any of the outputs.

Processing some outputs may need one of the other programs or an
external tool, but that is by choice. Both mm.exe/cc.exe can go straight
to EXE without any help.

The only thing not included in cc.exe is windows.h, because it is so
massive.)

libc provides C library, however header should
be matched to the library, so libc also provides headers.

There is no header that I can see for Windows' msvcrt.dll C runtime.
(There was/is a Windows SDK, but that is a massive product mostly to do
with WinAPI.)

Linux has distributions, which beside bare OS provide a lot of packages. Binary C library is used by almost all programs so is provided even
in minimal install. Linux has package managers, so everyting you
install may be split into small packages, but for user it is just
knowing few crucial names, package manager will install all
dependencies.

AFAIK Windows alone does not have a package manager and you apparently
reject package managers provided by third parties. So the only
viable approach is to install big bundle ("self-contained compiler").

Other C compilers I've used on Windows (excluding monsters like gcc,
clang, msvc) either have their own install routine or the process is
trivial, such as extracting files from a ZIP file.

Mine doesn't even need installing: you just run the EXE from anywhere!

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Mon Nov 25 23:45:28 2024

On 25/11/2024 10:30, Michael S wrote:

On Sun, 24 Nov 2024 13:45:55 -0800
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

Bart <bc@freeuk.com> writes:

On 24/11/2024 20:01, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:
[...]

Most of a gcc installation is hundreds of header and archive (.a)
files for various libraries. There might be 32-bit and 64-bit
versions. I understand that. But it also makes it hard to isolate
the core compiler.

[...]
That doesn't agree with my observations.
Of course most of the headers and libraries are not part of gcc
itself.
As usual, you refer to the entire implementation as "gcc".
I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5,
installing each into a new directory.
The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I
strip
the executables.

That's even huger than mine! So, that are those 3.7GB full of? What
does the 1.9GB of executables do?

I installed compilers for multiple languages. A more typical
installation likely won't include compilers for Ada, Go, Fortran,
Modula-2, and Rust. There are a number of hard links to other files;
for example c++, g++, x86_64-pc-linux-gnu-c++, and
x86_64-pc-linux-gnu-g++ are all the same file. Apparently `du` is
clever enough to count them only once.

Here's the output of `ls -s` on the bin directory (sizes are in units
of 1024 bytes) :

total 611908
8828 c++ 8960 gm2 8828 x86_64-pc-linux-gnu-c++
8820 cpp 8264 gnat 8828 x86_64-pc-linux-gnu-g++
8828 g++ 13092 gnatbind 8820 x86_64-pc-linux-gnu-gcc
8820 gcc 9556 gnatchop 8820
x86_64-pc-linux-gnu-gcc-14.2.0 156 gcc-ar 12564 gnatclean
156 x86_64-pc-linux-gnu-gcc-ar 156 gcc-nm 7864 gnatkr
156 x86_64-pc-linux-gnu-gcc-nm 152 gcc-ranlib 8564 gnatlink
152 x86_64-pc-linux-gnu-gcc-ranlib 8828 gccgo 12764 gnatls
8828 x86_64-pc-linux-gnu-gccgo 8820 gccrs 13584 gnatmake
8820 x86_64-pc-linux-gnu-gccrs 7784 gcov 12236 gnatname
8828 x86_64-pc-linux-gnu-gdc 6324 gcov-dump 12308 gnatprep
8824 x86_64-pc-linux-gnu-gfortran 6468 gcov-tool 11136 go
8960 x86_64-pc-linux-gnu-gm2 8828 gdc 620 gofmt
8824 gfortran 308740 lto-dump

67% of bin directory of i386 gcc13 compiler that I compiled from source
on msys2 few months ago is a single huge executive:i386-elf-lto-dump.exe 410,230,002 bytes with symbols, 28,347,904 bytes stripped.
Copying such file is not instant, even on SSD. Certainly takes time
over internet.

It does not look like I have any use for it, stripped or not. When I
want dump, I use smaller utility, i386-elf-objdump.exe (14,740,647
bytes with symbols, 2,242,048 bytes stripped) that already does more
than I would know to use.

LTO object files are vastly different beasts from normal object files,
so it does not surprise me that the dump utility is so much bigger. If
you don't use LTO, then presumably you will not need the lto-dump
utility. (It is not a tool I have ever looked at myself.)

When people build gcc themselves, it is not uncommon that they want
binaries with symbols for debugging, testing, profiling, objdumping, or whatever - after all, most users use pre-build binaries. So it is not unreasonable to have at least some symbols with the binaries. But it
seems here that you have built them with full debugging information, not
just symbols. That is only really useful if you intend to run gcc
itself under gdb. Stripping the binaries isn't going to make them any
faster (at least, not under Linux - maybe in Windows the whole file is loaded), but it would make copying the files faster.

Arm gcc12 compiler for small emebedded targets (arm-none-eabi-gcc) in
the same msys2 environment that I did not compile from source also
contains arm-none-eabi-lto-dump.exe and it is also the biggest exe by
far, but at least it is stripped and only 23,728,128

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Michael S@3:633/280.2 to All on Tue Nov 26 02:55:09 2024

On Mon, 25 Nov 2024 13:45:28 +0100
David Brown <david.brown@hesbynett.no> wrote:

LTO object files are vastly different beasts from normal object
files, so it does not surprise me that the dump utility is so much
bigger. If you don't use LTO, then presumably you will not need the
lto-dump utility. (It is not a tool I have ever looked at myself.)

I am pretty sure that even if I ever want to use LTO with gcc I'd still
will have no need for lto-dump. What would matter for me in this case
would be a final result (exe) rather than object files. And in order to
look at exe I'd still use a normal objdump.

The situation is not purely hypothetical. I regularly use LTCG with
Microsoft tools. Never ever I wanted to disassemble .obj files after
LTCG compilation. When occasionally I wanted to look at asm after LTCG,
it always was an exe.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Keith Thompson@3:633/280.2 to All on Tue Nov 26 03:27:46 2024

Bart <bc@freeuk.com> writes:

On 25/11/2024 11:21, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

This seems to be a thing with Linux, where a big chunk of a C
implementation is provided by the OS.

That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities. On Windows, C compilers tend to be self-contained (except for >>> Clang which appears to be parasitical: it used to piggy-back onto gcc,
then it switched to MSVC).

You know that at source level there are separate projects: gcc
proper, binutils and libc.

Actually, no I don't. I said more on this in my reply to Keith a short
while ago.

You don't know that after it's been explained to you dozens of times?

My experience of C compilers on Windows is that they provide a means
to turn .c files into executable files. Such a compiler on Windows
generally has to be self-contained, since very little is provided by
the OS.

Bart, can you explain the difference between a C compiler and a C implementation? Or do you believe they're the same thing? (Hint:
They're not.)

[...]

So from my point of view, gcc is the outlier.

And what's wrong with that?

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: None to speak of (3:633/280.2@fidonet)

From Keith Thompson@3:633/280.2 to All on Tue Nov 26 03:32:06 2024

David Brown <david.brown@hesbynett.no> writes:

On 24/11/2024 21:01, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:
[...]

Most of a gcc installation is hundreds of header and archive (.a)
files for various libraries. There might be 32-bit and 64-bit
versions. I understand that. But it also makes it hard to isolate the
core compiler.

[...]
That doesn't agree with my observations.
Of course most of the headers and libraries are not part of gcc
itself.
As usual, you refer to the entire implementation as "gcc".
I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5,
installing each into a new directory.
The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I
strip the executables.

That sounds like a /very/ large size. A quick check of the pre-build
Debian package for gcc-14 is about 90 MB installed. (That is for the
C compiler - not binutils, or libraries.) C++ adds another 50% to
that. Are you including the build directories with all the object
files too?

It is very large, partly because the executables are not stripped
(that's the default when building from source), and partly because I
configured it for multiple languages. No cross-compilers.

No, I'm not including the build directories, just the directory
specified with "./configure --prefix=...".

I might try doing a stripped installation for C only, just to see how
big it is.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: None to speak of (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Tue Nov 26 04:25:42 2024

On 25/11/2024 16:27, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

On 25/11/2024 11:21, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

This seems to be a thing with Linux, where a big chunk of a C
implementation is provided by the OS.

That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities. On Windows, C compilers tend to be self-contained (except for >>>> Clang which appears to be parasitical: it used to piggy-back onto gcc, >>>> then it switched to MSVC).

You know that at source level there are separate projects: gcc
proper, binutils and libc.

Actually, no I don't. I said more on this in my reply to Keith a short
while ago.

You don't know that after it's been explained to you dozens of times?

My experience of C compilers on Windows is that they provide a means
to turn .c files into executable files. Such a compiler on Windows
generally has to be self-contained, since very little is provided by
the OS.

Bart, can you explain the difference between a C compiler and a C implementation? Or do you believe they're the same thing? (Hint:
They're not.)

Well, I write language implementations, and I consider them largely the
same thing.

So who's right? Just becase a C compiler works in a certain peculiar way
on one OS doesn't means that is the only way.

Have a look at the 'CC' product described here, about half way down:

https://github.com/sal55/langs/blob/master/CompilerSuite.md

It is a single file that can turn into source into native code, or it
can run it directly, or it can interpret it.

I call this 0.3MB program a 'compiler'. I also call it an C
implementation (technically, a C subset). (What would /you/ call what
this program does?)

All it lacks that you might quibble over is an implementation of the C standard library. I use a library that is part of Windows, and also use
that same library from two other languages, neither of which is C.

Technically, a 'C' compiler only needs to turn C source into some
next-level representation. Beyond that it's pretty much a compiler like
any other, not specific to C. A compiler may consider its job done when
it gets to IR, or ASM source, or it may continue all the way to a
running binary, like mine do.

As to what gcc does and how it's classified, I'm past caring. Does it eventually produce a binary? Then that's all that matters.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Scott Lurndal@3:633/280.2 to All on Tue Nov 26 04:30:10 2024

Reply-To: slp53@pacbell.net

Bart <bc@freeuk.com> writes:

On 24/11/2024 21:45, Keith Thompson wrote:

A more useful installation would of course need more standard headers,
an assembler, linker, and whatever .a files are needed to provide the
standard library.

Sure, those are all part of a C implementation, though they're not part
of gcc.

This seems to be a thing with Linux, where a big chunk of a C
implementation is provided by the OS.

Actually, no. The OS provides the dynamic linker and some os-specific
header files. Pretty much everything else comes from various
third-party packages.

That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities.

None of those come from the OS. They come from separate packages
produced by third parties (some, like gcc, binutils, etc come from
the FSF, other libraries come from other sources).

On Windows, C compilers tend to be self-contained (except for

Leaving aside the fact that Windows has always been a toy
environment, all the tools you complain about were developed
on, and primarily for UNIX and derivitives. Not windows.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Tue Nov 26 04:50:10 2024

On 25/11/2024 18:30, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 24/11/2024 21:45, Keith Thompson wrote:

A more useful installation would of course need more standard headers, >>>> an assembler, linker, and whatever .a files are needed to provide the
standard library.

Sure, those are all part of a C implementation, though they're not part
of gcc.

This seems to be a thing with Linux, where a big chunk of a C
implementation is provided by the OS.

Actually, no. The OS provides the dynamic linker and some os-specific
header files. Pretty much everything else comes from various
third-party packages.

That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities.

None of those come from the OS. They come from separate packages
produced by third parties (some, like gcc, binutils, etc come from
the FSF, other libraries come from other sources).

And of course there are different standard C libraries available, as
well as different C compilers, and you can mix and match - gcc with
musl, clang with glibc, icc with newlib, etc. There has to be a certain degree of cooperation and compatibility for a compiler and a library to
work together, but they can be (and usually are) separate projects from separate groups.

On Windows, C compilers tend to be self-contained (except for

Leaving aside the fact that Windows has always been a toy
environment, all the tools you complain about were developed
on, and primarily for UNIX and derivitives. Not windows.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From David Brown@3:633/280.2 to All on Tue Nov 26 04:54:29 2024

On 25/11/2024 16:55, Michael S wrote:

On Mon, 25 Nov 2024 13:45:28 +0100
David Brown <david.brown@hesbynett.no> wrote:

LTO object files are vastly different beasts from normal object
files, so it does not surprise me that the dump utility is so much
bigger. If you don't use LTO, then presumably you will not need the
lto-dump utility. (It is not a tool I have ever looked at myself.)

I am pretty sure that even if I ever want to use LTO with gcc I'd still
will have no need for lto-dump.

That is quite plausible. I only occasionally have use for objdump, and
I suspect many programmers never use it at all. I doubt if I'd use the lto-dump version much if and when I start using LTO seriously.

What would matter for me in this case
would be a final result (exe) rather than object files. And in order to
look at exe I'd still use a normal objdump.

Again, I don't doubt you are correct.

All I am saying is that it does not surprise me that the lto-dump
program is significantly bigger than objdump. And presumably some
people do find it useful.

The situation is not purely hypothetical. I regularly use LTCG with
Microsoft tools. Never ever I wanted to disassemble .obj files after
LTCG compilation. When occasionally I wanted to look at asm after LTCG,
it always was an exe.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Tim Rentsch@3:633/280.2 to All on Tue Nov 26 05:49:27 2024

Bart <bc@freeuk.com> writes:

It's funny how nobody seems to care about the speed of compilers
(which can vary by 100:1), but for the generated programs, the 2:1
speedup you might get by optimising it is vital!

I think most people would rather take this path (these times
are actual measured times of a recently written program):

compile time: 1 second
program run time: ~7 hours

than this path (extrapolated using the ratios mentioned above):

compile time: 0.01 second
program run time: ~14 hours

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Tim Rentsch@3:633/280.2 to All on Tue Nov 26 05:50:25 2024

Bart <bc@freeuk.com> writes:

On 25/11/2024 16:27, Keith Thompson wrote:

Bart, can you explain the difference between a C compiler and a C
implementation? Or do you believe they're the same thing? (Hint:
They're not.)

Well, I write language implementations, and I consider them largely
the same thing.

So who's right?

In comp.lang.c, the C standard is right.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Tue Nov 26 06:46:45 2024

On 25/11/2024 17:30, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 24/11/2024 21:45, Keith Thompson wrote:

A more useful installation would of course need more standard headers, >>>> an assembler, linker, and whatever .a files are needed to provide the
standard library.

Sure, those are all part of a C implementation, though they're not part
of gcc.

This seems to be a thing with Linux, where a big chunk of a C
implementation is provided by the OS.

Actually, no. The OS provides the dynamic linker and some os-specific
header files. Pretty much everything else comes from various
third-party packages.

That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities.

None of those come from the OS.

So, if I install 5 distinct C compilers on Linux, will they each come
with their own stdio.h, or will they use the common one in /usr/include?

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Tue Nov 26 07:19:04 2024

On 25/11/2024 18:49, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

It's funny how nobody seems to care about the speed of compilers
(which can vary by 100:1), but for the generated programs, the 2:1
speedup you might get by optimising it is vital!

I think most people would rather take this path (these times
are actual measured times of a recently written program):

compile time: 1 second
program run time: ~7 hours

than this path (extrapolated using the ratios mentioned above):

compile time: 0.01 second
program run time: ~14 hours

I'm trying to think of some computationally intensive app that would run non-stop for several hours without interaction.

If you dig back throug the thread, you will see that I am not against compiling with optimisations for production code. But for very frequent routine builds I want it as fast as possible.

For such a task as your example might do, you would spend some time
testing on shorter examples and getting the best algorithm. Once you
feel it's the best, /then/ you can think about getting it optimised. It doesn't even matter how long it takes, if it's going to take hours anyway.

I thought of one artificial example, it's a C program to display the
Fibonacci sequence 1 to 100 using the recursive function for each fib(i).

I compiled it with gcc-O3 and set going. While it was doing that, I set
up the same test using my interpreted language. It was much slower
obviously. So I added memoisation. Now it showed all 100 values
instantly (the C version meanwhile is in the low 50s).

I noticed however that it overflowed the 64-bit range at around fib(93)
(as the C version might do eventually). So I tweaked my 'slow' version
to use bignum values. Then I tweaked it again to show the first 10,000
values.

At this point, the optimised C was still in the mid 50s.

The point is, for such a task as this, you do as much as you can to
bring down the runtime, which could reduce it by a magnitude or two with
the right choices.

Adding -O3 at the end is a nice bonus speedup, but that's all it is.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Keith Thompson@3:633/280.2 to All on Tue Nov 26 07:32:01 2024

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Bart <bc@freeuk.com> writes:

On 25/11/2024 16:27, Keith Thompson wrote:

Bart, can you explain the difference between a C compiler and a C
implementation? Or do you believe they're the same thing? (Hint:
They're not.)

Well, I write language implementations, and I consider them largely
the same thing.

So who's right?

In comp.lang.c, the C standard is right.

Agreed, but the C standard doesn't define the word "compiler",
and uses it only in non-normative text (I searched N3096).

What I consider to be a "compiler" is the program or programs that
implement translation phases 1 through 7. (The 8th and final phase
is linking.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: None to speak of (3:633/280.2@fidonet)

From Keith Thompson@3:633/280.2 to All on Tue Nov 26 07:51:48 2024

Bart <bc@freeuk.com> writes:
[...]

So, if I install 5 distinct C compilers on Linux, will they each come
with their own stdio.h, or will they use the common one in
/usr/include?

History does not suggest that you actually care about the answer,
but I'll give you one anyway.

It depends on how each compiler is configured. On my system,
gcc, clang, and tcc all use /usr/include/stdio.h, but musl-gcc (a
wrapper that invokes gcc with options to use musl, an alternative
C library implementation). Or I can invoke any of those compilers
with options to use some other library implementation.

Remember that typical Linux-based systems are very modular, with system
files provided via the package manager. The files that make up a C implementation are provided by multiple different package. Package dependencies are managed in such a way that installing a full C
implementation is reasonably straightforward.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: None to speak of (3:633/280.2@fidonet)

From Scott Lurndal@3:633/280.2 to All on Tue Nov 26 08:29:48 2024

Reply-To: slp53@pacbell.net

Bart <bc@freeuk.com> writes:

On 25/11/2024 18:49, Tim Rentsch wrote:

I'm trying to think of some computationally intensive app that would run >non-stop for several hours without interaction.

I can think of several - HDL simulators (vcs, et al), system simulators
like Simh, Qemu, Synopsys Virutalizer, SIMICS, most HPC codes (e.g. fluid dynamics)
Machine Learning training, et alia.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Tue Nov 26 10:20:04 2024

On 25/11/2024 21:29, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 25/11/2024 18:49, Tim Rentsch wrote:

I'm trying to think of some computationally intensive app that would run
non-stop for several hours without interaction.

I can think of several - HDL simulators (vcs, et al), system simulators
like Simh, Qemu, Synopsys Virutalizer, SIMICS, most HPC codes (e.g. fluid dynamics)
Machine Learning training, et alia.

OK, good.

So the only preparation you have to do to get those running at maximum
speed is just to use -O3 on your compilers instead of -O0.

Understood. You don't need to need to worry about anything else.

However I assume that has already been done when building products like
LLVM (which apparently takes somewhat longer then one second to build), however I keep seeing comments about it like this:

"I think the biggest complaint is compile time."

"but if you want fast compile times or just "O1" instead of "O3" level performance, it can feel like overkill."

"ah this seems like two very different use cases. Stating the obvious:
when debugging I want as fast builds as possible. When shipping I want
as fast software as possible."

Apparently this is not obvious to anybody here except me!

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Scott Lurndal@3:633/280.2 to All on Tue Nov 26 12:09:54 2024

Reply-To: slp53@pacbell.net

Bart <bc@freeuk.com> writes:

On 25/11/2024 21:29, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 25/11/2024 18:49, Tim Rentsch wrote:

I'm trying to think of some computationally intensive app that would run >>> non-stop for several hours without interaction.

I can think of several - HDL simulators (vcs, et al), system simulators
like Simh, Qemu, Synopsys Virutalizer, SIMICS, most HPC codes (e.g. fluid dynamics)
Machine Learning training, et alia.

OK, good.

So the only preparation you have to do to get those running at maximum
speed is just to use -O3 on your compilers instead of -O0.

That appears to be your opinion. It is not shared by myself
nor any programmer I've ever met.

Understood. You don't need to need to worry about anything else.

How do you conclude that based on a simple list of applications?

Everything from the initial design proposal to the selection of
implementation language to the characteristics of the data structures
to the algorithms chosen are part of the process of creating a real-world application. The actually compiler flags are in the noise, for the
most part.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Tue Nov 26 12:28:47 2024

On 26/11/2024 01:09, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 25/11/2024 21:29, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 25/11/2024 18:49, Tim Rentsch wrote:

I'm trying to think of some computationally intensive app that would run >>>> non-stop for several hours without interaction.

I can think of several - HDL simulators (vcs, et al), system simulators
like Simh, Qemu, Synopsys Virutalizer, SIMICS, most HPC codes (e.g. fluid dynamics)
Machine Learning training, et alia.

OK, good.

So the only preparation you have to do to get those running at maximum
speed is just to use -O3 on your compilers instead of -O0.

That appears to be your opinion. It is not shared by myself
nor any programmer I've ever met.

Understood. You don't need to need to worry about anything else.

How do you conclude that based on a simple list of applications?

Everything from the initial design proposal to the selection of implementation language to the characteristics of the data structures
to the algorithms chosen are part of the process of creating a real-world application. The actually compiler flags are in the noise, for the
most part.

That's my point.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Tim Rentsch@3:633/280.2 to All on Tue Nov 26 23:29:55 2024

Bart <bc@freeuk.com> writes:

On 25/11/2024 18:49, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

It's funny how nobody seems to care about the speed of compilers
(which can vary by 100:1), but for the generated programs, the 2:1
speedup you might get by optimising it is vital!

I think most people would rather take this path (these times
are actual measured times of a recently written program):

compile time: 1 second
program run time: ~7 hours

than this path (extrapolated using the ratios mentioned above):

compile time: 0.01 second
program run time: ~14 hours

I'm trying to think of some computationally intensive app that would
run non-stop for several hours without interaction.

The conclusion is the same whether the program run time
is 7 hours, 7 minutes, or 7 seconds.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Wed Nov 27 00:31:30 2024

On 26/11/2024 12:29, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 25/11/2024 18:49, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

It's funny how nobody seems to care about the speed of compilers
(which can vary by 100:1), but for the generated programs, the 2:1
speedup you might get by optimising it is vital!

I think most people would rather take this path (these times
are actual measured times of a recently written program):

compile time: 1 second
program run time: ~7 hours

than this path (extrapolated using the ratios mentioned above):

compile time: 0.01 second
program run time: ~14 hours

I'm trying to think of some computationally intensive app that would
run non-stop for several hours without interaction.

The conclusion is the same whether the program run time
is 7 hours, 7 minutes, or 7 seconds.

Funny you should mention 7 seconds. If I'm working on single source file called sql.c for example, that's how long it takes for gcc to create an unoptimised executable:

c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38

Testing it might only take a second:

c:\cx>type input
select 2+2;

c:\cx>sql <input
4

With a different compiler then the edit-run cycle can be a lot nippier:

c:\cx>tm cc sql
Compiling sql.c to sql.exe
TM: 0.27

If that is still onerous, I can try interpreting:

c:\cx>tm cc -i sql <input
Compiling sql.c to sql.(int)
4
TM: 0.19

So compiling to IL, then interpreting that IL (which is 40 times slower
than native code), /and/ running my test, takes 1/5th of a second in all.

That's 40 times faster than the equivalent with gcc-O0 (despite the interpreted part being 40 times slower!):

c:\cx\tm test.bat
c:\cx>gcc sql.c -osql.exe && sql 0<input
4
TM: 7.74

And 200 times faster than gcc-O2 which everyone here seems to be
recommending:

c:\cx>tm test.bat
c:\cx>gcc sql.c -O2 -osql.exe && sql 0<input
4
TM: 38.60

Some might advise not working with such a large single source module at
all, but that is the task here. If trying to investigate why my cc
product is failing, I might put in tracing statements into the source,
and compile and run with both compilers to compare the outputs. For such
a purpose, -O2 or -O3 is utterly pointless.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Wed Nov 27 21:47:11 2024

Bart <bc@freeuk.com> wrote:

On 25/11/2024 17:30, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 24/11/2024 21:45, Keith Thompson wrote:

A more useful installation would of course need more standard headers, >>>>> an assembler, linker, and whatever .a files are needed to provide the >>>>> standard library.

Sure, those are all part of a C implementation, though they're not part >>>> of gcc.

This seems to be a thing with Linux, where a big chunk of a C
implementation is provided by the OS.

Actually, no. The OS provides the dynamic linker and some os-specific
header files. Pretty much everything else comes from various
third-party packages.

That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities.

None of those come from the OS.

So, if I install 5 distinct C compilers on Linux, will they each come
with their own stdio.h, or will they use the common one in /usr/include?

It depends on the compiler. IIUC your compiler has its own stdio.h.
There was 'Tendra C compiler' (tcc in short) which had its own
handling of headers. Basically, there was internal compiler
magic to activate headers. I do not remember if "real" headers were
just part of compiler executable or were kept in files. But
real header data were in compiler-specific format. You could
not look for stdio.h to see function declarations, I do not
remember if stdio.h was present as real file, but if it were
present it would contain only some compiler-specific magic
to activate declarations. In other words, Tendra did not
use system headers and its headers were unusable for other
compilers.

If you ask why, the reason was portability and standard compilance.
Tendra was supposed to give you the same results on wide
selection of machines, provided that machines supported
appropriater API-s. I do not remember how/if they handled
32 versus 64 bit issue, but their headers were claimed to be
100% standard compliant, as opposed to vendor headers which
often had various incompatibilites. They also provided
wrapper libraries so that when you called their wrapper
you got standard-specified behaviour (vendor libraries
frequently violated standards). Concerning API-s, they
went quite a bit beyond standard C and provided several
industry standards.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Tim Rentsch@3:633/280.2 to All on Thu Nov 28 10:23:32 2024

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Bart <bc@freeuk.com> writes:

On 25/11/2024 16:27, Keith Thompson wrote:

Bart, can you explain the difference between a C compiler and a C
implementation? Or do you believe they're the same thing? (Hint:
They're not.)

Well, I write language implementations, and I consider them largely
the same thing.

So who's right?

In comp.lang.c, the C standard is right.

Agreed, but the C standard doesn't define the word "compiler",
and uses it only in non-normative text (I searched N3096).

That makes no difference to my point, which is about word
usage, not about what is or isn't C. It is clear that the
C standard considers a compiler and an implementation to be
two different things.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Tim Rentsch@3:633/280.2 to All on Thu Nov 28 16:18:09 2024

Bart <bc@freeuk.com> writes:

On 26/11/2024 12:29, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 25/11/2024 18:49, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

It's funny how nobody seems to care about the speed of compilers
(which can vary by 100:1), but for the generated programs, the 2:1
speedup you might get by optimising it is vital!

I think most people would rather take this path (these times
are actual measured times of a recently written program):

compile time: 1 second
program run time: ~7 hours

than this path (extrapolated using the ratios mentioned above):

compile time: 0.01 second
program run time: ~14 hours

I'm trying to think of some computationally intensive app that would
run non-stop for several hours without interaction.

The conclusion is the same whether the program run time
is 7 hours, 7 minutes, or 7 seconds.

Funny you should mention 7 seconds. If I'm working on single source
file called sql.c for example, that's how long it takes for gcc to
create an unoptimised executable:

c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38

Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed. And of course you picked the farthest-most
outlier as your example, grossly misrepresenting any sort of
average or typical case.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Michael S@3:633/280.2 to All on Thu Nov 28 23:37:15 2024

On Wed, 27 Nov 2024 21:18:09 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Bart <bc@freeuk.com> writes:

On 26/11/2024 12:29, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 25/11/2024 18:49, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

It's funny how nobody seems to care about the speed of compilers
(which can vary by 100:1), but for the generated programs, the
2:1 speedup you might get by optimising it is vital!

I think most people would rather take this path (these times
are actual measured times of a recently written program):

compile time: 1 second
program run time: ~7 hours

than this path (extrapolated using the ratios mentioned above):

compile time: 0.01 second
program run time: ~14 hours

I'm trying to think of some computationally intensive app that
would run non-stop for several hours without interaction.

The conclusion is the same whether the program run time
is 7 hours, 7 minutes, or 7 seconds.

Funny you should mention 7 seconds. If I'm working on single source
file called sql.c for example, that's how long it takes for gcc to
create an unoptimised executable:

c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38

Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed.

My impression was that Bart is talking about machine-generated code.
For machine generated code 250Kloc is not too much.
I would think that in field of compiled-code HDL simulation people are interested in compilation of as big sources as the can afford.

And of course you picked the farthest-most
outlier as your example, grossly misrepresenting any sort of
average or typical case.

I remember having much shorter file (core of 3rd-party TCP protocol implementation) where compilation with gcc took several seconds.

Looked at it now - only 22 Klocs.
Text size in .o - 34KB.
Compilation time on much newer computer than the one I remembered, with
good SATA SSD and 4 GHz Intel Haswell CPU - a little over 1 sec. That
with gcc 4.7.3. I would guess that if I try gcc13 it would be 1.5 to 2
times longer.
So, in terms of Klock/sec it seems to me that time reported by Bart
is not outrageous. Indeed, gcc is very slow when compiling any source
several times above average size.
In this particular case I can not compare gcc to alternative, because
for a given target (Altera Nios2) there are no alternatives.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Fri Nov 29 01:27:25 2024

On 28/11/2024 05:18, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 26/11/2024 12:29, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 25/11/2024 18:49, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

It's funny how nobody seems to care about the speed of compilers
(which can vary by 100:1), but for the generated programs, the 2:1 >>>>>> speedup you might get by optimising it is vital!

I think most people would rather take this path (these times
are actual measured times of a recently written program):

compile time: 1 second
program run time: ~7 hours

than this path (extrapolated using the ratios mentioned above):

compile time: 0.01 second
program run time: ~14 hours

I'm trying to think of some computationally intensive app that would
run non-stop for several hours without interaction.

The conclusion is the same whether the program run time
is 7 hours, 7 minutes, or 7 seconds.

Funny you should mention 7 seconds. If I'm working on single source
file called sql.c for example, that's how long it takes for gcc to
create an unoptimised executable:

c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38

Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed. And of course you picked the farthest-most
outlier as your example, grossly misrepresenting any sort of
average or typical case.

It's not atypical for me! I explained why I might use such a file.

And for me, used to decades of sub-one-second response times, 7 seconds
seems like for ever. Especially when there is no feedback at all from gcc.

When my tools had to compile multiple modules they would show a progress report as each one was processed.

gcc says nothing (unless you use --verbose then it spews reams of junk
for every file). Maybe after a few seconds it's 90% done, or maybe 10%;
who knows?

Also, you haven't really explained why someone should wait an extra 7
seconds for a task that can clearly be accomplished in a fraction of
second, given that gcc-O0 generates equally poor code.

Nor why a production version of gcc needs to be itself built with -O3
anyway. Since it sounds like an unoptimised version would only ever take
an extra second or two on any of your tiny inputs!

(And with David Brown's projects where apparently the gcc compiler is
either never invoked, or always finishes in milliseconds, it would make
no measurable difference at all.)

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Fri Nov 29 02:25:48 2024

On 28/11/2024 12:37, Michael S wrote:

On Wed, 27 Nov 2024 21:18:09 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38

Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed.

My impression was that Bart is talking about machine-generated code.
For machine generated code 250Kloc is not too much.

This file mostly comprises sqlite3.c which is a machine-generated
amalgamation of some 100 actual C files.

You wouldn't normally do development with that version, but in my
scenario, where I was trying to find out why the version built with my compiler was buggy, I might try adding debug info to it then building
with a working compiler (eg. gcc) to compare with.

But, yes, when I used to do more transpilation to C, then the generated
code would be a single C source file. That one could also require
frequent recompiles as C, if there were bugs in the process.

Then the differences in compile-time of the C are clear; here,
generating qc.c from the original sources took 0.09 seconds:

c:\qx>tm gcc qc.c GCC -O0
TM: 2.28

c:\qx>tm tc qc TCC from a script as it's messy
c:\qx>tcc qc.c c:\windows\system32\user32.dll -luser32 c:\windows\system32\kernel32.dll -fdollars-in-identifiers
TM: 0.23

c:\qx>tm cc qc Using my C compiler
Compiling qc.c to qc.exe
TM: 0.11

c:\qx>tm mm qc Compile original source to EXE
Compiling qc.m to qc.exe
TM: 0.09

c:\qx>tm gcc -O2 qc.c GCC -O2
TM: 11.02

Usually tcc is faster than my product, but something about the generated
C (maybe long, messy identifiers) is slowing it down. But it is still 10
times faster than gcc-O0.

The last timing is gcc generating optimised code; usually the only
reason why gcc would be used. Then it takes 120 times longer to create
the executable than my normal native build process.

Tim isn't asking the right questions (or any questions!). WHY does gcc
take so long to generate indifferent code when the task can clearly be
done at least a magnitude faster?

Whatever it is it's doing, why isn't there an option to skip that for a streamlined build? (Maybe you accidentally deleted the EXE and need to recreate it; it doesn't need the same analysis.)

I've several times suggested that gcc should have an -O-1 option that
runs a secretly bundled version of Tiny C.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Michael S@3:633/280.2 to All on Fri Nov 29 02:46:31 2024

On Thu, 28 Nov 2024 15:25:48 +0000
Bart <bc@freeuk.com> wrote:

I've several times suggested that gcc should have an -O-1 option that
runs a secretly bundled version of Tiny C.

Hopefully, you are not serious about it.
The differences between gcc and tcc goes well beyond code
analysis warnings or code generation. tcc does not even support full
c99, although it is very close to it. Much less so, new versions of C
standard. Also, while tcc supports few gnu extensions, it certainly does
not support all of them.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Fri Nov 29 04:28:06 2024

On 28.11.2024 15:27, Bart wrote:

[ compilation times ]

And for me, used to decades of sub-one-second response times, 7 seconds
seems like for ever. [...]

Sub-seconds is very important in response times of interactive tools;
I recall we've measured, e.g. for GUI applications, the exact timing,
and we've taken into account results of psychological sciences. The
accepted response times for our applications were somewhere around
0.20 seconds, and even 0.50 seconds was by far unacceptable.

But we're speaking about compilation times. And I'm a bit astonished
about a sub-second requirement or necessity. I'm typically compiling
source code after I've edited it, where the latter is by far the most dominating step. And before the editing there's usually the analysis
of code, that requires even more time than the simple but interactive
editing process. When I start the compile all the major time demanding
tasks that are necessary to create the software fix have already been
done, and I certainly don't need a sub-second response from compiler.

Though I observed a certain behavior of programmers who use tools with
a fast response time. Since it doesn't cost anything they just make a
single change and compile to see whether it works, and, rinse repeat,
do that for every _single_ change *multiple* times. My own programming
habits got also somewhat influenced by that, though I still try to fix
things in brain before I ask the compiler what it thinks of my change.
This is certainly influenced by the mainframe days where I designed my algorithms on paper, punched my program on a stack of punch cards, and
examined and fixed the errors all at once. The technical situation has
changed (mostly improved) during the decades but the habits (how often
you start a compiler in the development process cycle), I think, has
also changed, but not necessarily improved.

Yes, I understand that it seems to you that 7 seconds is like forever
if you see the compiler as an instant-responder interactive tool.

BTW; it may be worthwhile (for those who compile often, probably more
often than necessary, and want the compilation results instantly) to
consider tools that compile in parallel while editing their code.

Janis

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Fri Nov 29 05:25:59 2024

On 28/11/2024 17:28, Janis Papanagnou wrote:

On 28.11.2024 15:27, Bart wrote:

[ compilation times ]

And for me, used to decades of sub-one-second response times, 7 seconds
seems like for ever. [...]

Sub-seconds is very important in response times of interactive tools;
I recall we've measured, e.g. for GUI applications, the exact timing,
and we've taken into account results of psychological sciences. The
accepted response times for our applications were somewhere around
0.20 seconds, and even 0.50 seconds was by far unacceptable.

But we're speaking about compilation times. And I'm a bit astonished
about a sub-second requirement or necessity. I'm typically compiling
source code after I've edited it, where the latter is by far the most dominating step. And before the editing there's usually the analysis
of code, that requires even more time than the simple but interactive
editing process.

You can make a similar argument about turning on the light switch when entering a room. Flicking light switches is not something you need to do
every few seconds, but if the light took 5 seconds to come on (or even
one second), it would be incredibly annoying.

It would stop the fluency of whatever you were planning to do. You might
even forget why you needed to go into the room in the first place.

When I start the compile all the major time demanding

tasks that are necessary to create the software fix have already been
done, and I certainly don't need a sub-second response from compiler.

Though I observed a certain behavior of programmers who use tools with
a fast response time. Since it doesn't cost anything they just make a
single change and compile to see whether it works, and, rinse repeat,
do that for every _single_ change *multiple* times.

Well, what's wrong with that? It's how lots of things already work, by
doing things incrementally.

If recompiling an entire program of any size really was instant, would
you still work exactly the same way?

People find scripting languages productive, partly because there is no discrete build step.

My own programming
habits got also somewhat influenced by that, though I still try to fix
things in brain before I ask the compiler what it thinks of my change.
This is certainly influenced by the mainframe days where I designed my algorithms on paper, punched my program on a stack of punch cards, and examined and fixed the errors all at once.

I also remember using punched cards at college. But generally it was
using an interactive terminal. Compiling and linking were still big
deals when using mini- and mainframe computers.

Oddly, it was only using tiny, underpowered microprocessor systems,
that I realised how fast language tools really could be. At least the
ones I wrote.

Those ported from bigger computers would take minutes for the simplest program, as I later found. Mine took seconds or fraction of a second.
Part of that was down to using a resident compile/IDE that kept things
in memory as much as possible, since floppy disks were slow.

Here's a test: how many times can you twiddle your thumbs while waiting
for something to build? (That is, put your hands together with
interlocked fingers, and rotate your thumbs around each other).

I can only manage 3-4 - if building an artificial 1Mloc benchmark.
Otherwise it's impossible to even put my hands together.

In 7 seconds I can do nearly 25 twiddles. That's a really useful use of
my time!

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Fri Nov 29 20:36:03 2024

On 28.11.2024 19:25, Bart wrote:

On 28/11/2024 17:28, Janis Papanagnou wrote:

On 28.11.2024 15:27, Bart wrote:

[ compilation times ]

And for me, used to decades of sub-one-second response times, 7 seconds
seems like for ever. [...]

Sub-seconds is very important in response times of interactive tools;
I recall we've measured, e.g. for GUI applications, the exact timing,
and we've taken into account results of psychological sciences. The
accepted response times for our applications were somewhere around
0.20 seconds, and even 0.50 seconds was by far unacceptable.

But we're speaking about compilation times. And I'm a bit astonished
about a sub-second requirement or necessity. I'm typically compiling
source code after I've edited it, where the latter is by far the most
dominating step. And before the editing there's usually the analysis
of code, that requires even more time than the simple but interactive
editing process.

You can make a similar argument about turning on the light switch when entering a room. Flicking light switches is not something you need to do every few seconds, but if the light took 5 seconds to come on (or even
one second), it would be incredibly annoying.

It is. (It was with flickering fluorescent lamps in the past and is
with the contemporary energy saving lamps nowadays that need time to
radiate in full glory.) - But I'm not making comparisons/parables;
I made a concrete argument and coupled it with behavioral patterns
and work processes in the context we were speaking about, compiling.

It would stop the fluency of whatever you were planning to do. You might
even forget why you needed to go into the room in the first place.

When I start the compile all the major time demanding
tasks that are necessary to create the software fix have already been
done, and I certainly don't need a sub-second response from compiler.

Though I observed a certain behavior of programmers who use tools with
a fast response time. Since it doesn't cost anything they just make a
single change and compile to see whether it works, and, rinse repeat,
do that for every _single_ change *multiple* times.

Well, what's wrong with that? It's how lots of things already work, by
doing things incrementally.

There's nothing "wrong" with it. (I just consider it non-ergonomic
in the edit-compile-loop context I described.) You can (and should)
do what you prefer and what works for you - unless you work and
operate in a larger project context where efficient processes may
(or may not) conflict with your habits.

If recompiling an entire program of any size really was instant, would
you still work exactly the same way?

(I addressed that in my previous post.)

People find scripting languages productive, partly because there is no discrete build step.

(There are many reasons for using scripting languages; at least for
those that I use. And there are reasons to not use them.)

And there's reasons for using compiled and strongly typed languages.
One I already mentioned in my previous post; you see all errors at
once and can fix them in one iteration. - I seem to recall that you
are somewhat familiar with Algol 68; its error messages fosters an
efficient error correction process.

The point was and still is that it's inefficient to save seconds in
compiling and spend much more time in your edit-compile iterations.

The rest can be re-read if you missed that I wrote "I understand"
your edit-compile habits as an effect of being used to instant
responsive compilers [for the sort of code you are doing, in the
project context you are working, with the software organization
you have, and the development processes you apply].

My own programming
habits got also somewhat influenced by that, though I still try to fix
things in brain before I ask the compiler what it thinks of my change.
This is certainly influenced by the mainframe days where I designed my
algorithms on paper, punched my program on a stack of punch cards, and
examined and fixed the errors all at once.

I also remember using punched cards at college. But generally it was
using an interactive terminal. Compiling and linking were still big
deals when using mini- and mainframe computers.

I have (and also heard of) different experiences. (Like hitting the
Enter key on an interactive terminal to start a job and instantly
getting the prompt back.) Myself I worked with punch cards only on
mechanical punch terminals and then put the stack of cards in a
batch queue that got processed (with other jobs) at occasion; the
build times, compiles/links, were not an issue anyway with those
mainframes (TR, CDC, 360-clone). When we switched to interactive (non-mainframe) systems the processes got slower, much more time
consuming.

Oddly, it was only using tiny, underpowered microprocessor systems, that
I realised how fast language tools really could be. At least the ones I wrote.

Sure.

Janis

[...]

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Tim Rentsch@3:633/280.2 to All on Sat Nov 30 10:29:07 2024

Bart <bc@freeuk.com> writes:

On 28/11/2024 17:28, Janis Papanagnou wrote:

On 28.11.2024 15:27, Bart wrote:

[ compilation times ]

And for me, used to decades of sub-one-second response times, 7
seconds seems like for ever. [...]

Sub-seconds is very important in response times of interactive
tools; I recall we've measured, e.g. for GUI applications, the
exact timing, and we've taken into account results of psychological
sciences. The accepted response times for our applications were
somewhere around 0.20 seconds, and even 0.50 seconds was by far
unacceptable.

But we're speaking about compilation times. And I'm a bit
astonished about a sub-second requirement or necessity. I'm
typically compiling source code after I've edited it, where the
latter is by far the most dominating step. And before the editing
there's usually the analysis of code, that requires even more time
than the simple but interactive editing process.

You can make a similar argument about turning on the light switch
when entering a room. Flicking light switches is not something you
need to do every few seconds, but if the light took 5 seconds to
come on (or even one second), it would be incredibly annoying.

This analogy sounds like something a defense attorney would say who
has a client that everyone knows is guilty.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Sat Nov 30 13:46:18 2024

On 30.11.2024 00:29, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 28/11/2024 17:28, Janis Papanagnou wrote:

But we're speaking about compilation times. [...]

You can make a similar argument about turning on the light switch
when entering a room. Flicking light switches is not something you
need to do every few seconds, but if the light took 5 seconds to
come on (or even one second), it would be incredibly annoying.

This analogy sounds like something a defense attorney would say who
has a client that everyone knows is guilty.

Intentionally or not; it's funny to respond to an analogy with an
analogy. :-}

Janis

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Tim Rentsch@3:633/280.2 to All on Sat Nov 30 15:40:11 2024

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 30.11.2024 00:29, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 28/11/2024 17:28, Janis Papanagnou wrote:

But we're speaking about compilation times. [...]

You can make a similar argument about turning on the light switch
when entering a room. Flicking light switches is not something you
need to do every few seconds, but if the light took 5 seconds to
come on (or even one second), it would be incredibly annoying.

This analogy sounds like something a defense attorney would say who
has a client that everyone knows is guilty.

Intentionally or not; it's funny to respond to an analogy with an
analogy. :-}

My statement was not an analogy. Similar is not the same as
analogous.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Tim Rentsch@3:633/280.2 to All on Sat Nov 30 16:03:17 2024

Bart <bc@freeuk.com> writes:

On 28/11/2024 05:18, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 26/11/2024 12:29, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 25/11/2024 18:49, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

It's funny how nobody seems to care about the speed of
compilers (which can vary by 100:1), but for the generated
programs, the 2:1 speedup you might get by optimising it is
vital!

I think most people would rather take this path (these times
are actual measured times of a recently written program):

compile time: 1 second
program run time: ~7 hours

than this path (extrapolated using the ratios mentioned above):

compile time: 0.01 second
program run time: ~14 hours

I'm trying to think of some computationally intensive app that
would run non-stop for several hours without interaction.

The conclusion is the same whether the program run time
is 7 hours, 7 minutes, or 7 seconds.

Funny you should mention 7 seconds. If I'm working on single
source file called sql.c for example, that's how long it takes for
gcc to create an unoptimised executable:

c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38

Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed. And of course you picked the farthest-most
outlier as your example, grossly misrepresenting any sort of
average or typical case.

It's not atypical for me! [...]

I can easily accept that it might be typical for you. My
point is that it is not typical for almost everyone else.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Tim Rentsch@3:633/280.2 to All on Sat Nov 30 16:25:15 2024

Michael S <already5chosen@yahoo.com> writes:

On Wed, 27 Nov 2024 21:18:09 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Bart <bc@freeuk.com> writes:

On 26/11/2024 12:29, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 25/11/2024 18:49, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

It's funny how nobody seems to care about the speed of
compilers (which can vary by 100:1), but for the generated
programs, the 2:1 speedup you might get by optimising it is
vital!

I think most people would rather take this path (these times
are actual measured times of a recently written program):

compile time: 1 second
program run time: ~7 hours

than this path (extrapolated using the ratios mentioned above):

compile time: 0.01 second
program run time: ~14 hours

I'm trying to think of some computationally intensive app that
would run non-stop for several hours without interaction.

The conclusion is the same whether the program run time
is 7 hours, 7 minutes, or 7 seconds.

Funny you should mention 7 seconds. If I'm working on single
source file called sql.c for example, that's how long it takes for
gcc to create an unoptimised executable:

c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38

Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed.

My impression was that Bart is talking about machine-generated code.
For machine generated code 250Kloc is not too much. I would think
that in field of compiled-code HDL simulation people are interested
in compilation of as big sources as the can afford.

Sure. But Bart is implicitly saying that such cases make up the
bulk of C compilations, whereas in fact the reverse is true. People
don't care about Bart's complaint because the circumstances of his
examples almost never apply to them. And he must know this, even
though he tries to pretend he doesn't.

And of course you picked the farthest-most
outlier as your example, grossly misrepresenting any sort of
average or typical case.

I remember having much shorter file (core of 3rd-party TCP protocol implementation) where compilation with gcc took several seconds.

Looked at it now - only 22 Klocs.
Text size in .o - 34KB.
Compilation time on much newer computer than the one I remembered, with
good SATA SSD and 4 GHz Intel Haswell CPU - a little over 1 sec. That
with gcc 4.7.3. I would guess that if I try gcc13 it would be 1.5 to 2
times longer.
So, in terms of Klock/sec it seems to me that time reported by Bart
is not outrageous. Indeed, gcc is very slow when compiling any source several times above average size.
In this particular case I can not compare gcc to alternative, because
for a given target (Altera Nios2) there are no alternatives.

I'm not disputing his ratios on compilation speeds. I implicitly
agreed to them in my earlier remarks. The point is that the
absolute times are so small that most people don't care. For
some reason I can't fathom Bart does care, and apparently cannot
understand why most other people do not care. My conclusion is
that Bart is either quite immature or a narcissist. I have tried
to explain to him why other people think differently than he does,
but it seems he isn't really interested in having it explained.
Oh well, not my problem.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Sat Nov 30 21:00:30 2024

On 30.11.2024 05:40, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 30.11.2024 00:29, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 28/11/2024 17:28, Janis Papanagnou wrote:

But we're speaking about compilation times. [...]

You can make a similar argument about turning on the light switch
when entering a room. Flicking light switches is not something you
need to do every few seconds, but if the light took 5 seconds to
come on (or even one second), it would be incredibly annoying.

This analogy sounds like something a defense attorney would say who
has a client that everyone knows is guilty.

Intentionally or not; it's funny to respond to an analogy with an
analogy. :-}

My statement was not an analogy. Similar is not the same as
analogous.

It's of course (and obviously) not the same; it's just a
similar term where the semantics of both terms have an overlap.

(Not sure why you even bothered to reply and nit-pick here.
But with your habit you seem to have just missed the point;
the comparison of your reply-type with Bart's argumentation.)

Janis

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Sat Nov 30 22:26:41 2024

On 30/11/2024 05:25, Tim Rentsch wrote:

Michael S <already5chosen@yahoo.com> writes:

On Wed, 27 Nov 2024 21:18:09 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Bart <bc@freeuk.com> writes:

On 26/11/2024 12:29, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 25/11/2024 18:49, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

It's funny how nobody seems to care about the speed of
compilers (which can vary by 100:1), but for the generated
programs, the 2:1 speedup you might get by optimising it is
vital!

I think most people would rather take this path (these times
are actual measured times of a recently written program):

compile time: 1 second
program run time: ~7 hours

than this path (extrapolated using the ratios mentioned above):

compile time: 0.01 second
program run time: ~14 hours

I'm trying to think of some computationally intensive app that
would run non-stop for several hours without interaction.

The conclusion is the same whether the program run time
is 7 hours, 7 minutes, or 7 seconds.

Funny you should mention 7 seconds. If I'm working on single
source file called sql.c for example, that's how long it takes for
gcc to create an unoptimised executable:

c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38

Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed.

My impression was that Bart is talking about machine-generated code.
For machine generated code 250Kloc is not too much. I would think
that in field of compiled-code HDL simulation people are interested
in compilation of as big sources as the can afford.

Sure. But Bart is implicitly saying that such cases make up the
bulk of C compilations, whereas in fact the reverse is true. People
don't care about Bart's complaint because the circumstances of his
examples almost never apply to them. And he must know this, even
though he tries to pretend he doesn't.

And of course you picked the farthest-most
outlier as your example, grossly misrepresenting any sort of
average or typical case.

I remember having much shorter file (core of 3rd-party TCP protocol
implementation) where compilation with gcc took several seconds.

Looked at it now - only 22 Klocs.
Text size in .o - 34KB.
Compilation time on much newer computer than the one I remembered, with
good SATA SSD and 4 GHz Intel Haswell CPU - a little over 1 sec. That
with gcc 4.7.3. I would guess that if I try gcc13 it would be 1.5 to 2
times longer.
So, in terms of Klock/sec it seems to me that time reported by Bart
is not outrageous. Indeed, gcc is very slow when compiling any source
several times above average size.
In this particular case I can not compare gcc to alternative, because
for a given target (Altera Nios2) there are no alternatives.

I'm not disputing his ratios on compilation speeds. I implicitly
agreed to them in my earlier remarks. The point is that the
absolute times are so small that most people don't care. For
some reason I can't fathom Bart does care, and apparently cannot
understand why most other people do not care. My conclusion is
that Bart is either quite immature or a narcissist. I have tried
to explain to him why other people think differently than he does,
but it seems he isn't really interested in having it explained.
Oh well, not my problem.

EVERYBODY cares about compilation speeds. Except in this newsgroup where people try to pretent that it's irrelevant.

But then at the same time, they strive to keep those compile-times small:

* By using tools that have themselves been optimised to reduce their
runtimes, and where considerable resources have been expended to get the
best possible code, which naturally also benefits the tool

* By using the fastest possible hardware

* By trying to do parallel builds across multiple cores

* By organising source code into artificially small modules so that recompilation of just one module is quicker. So, relying on independent compilation.

* By going to considerable trouble to define inter-dependencies between modules, so that a make system can AVOID recompiling modules. (Why on
earth would it need to? Oh, because it would be slower!)

* By using development techniques involving thinking deeply about what
to change, to avoid a costly rebuild.

Etc.

All instead of relying on raw compilation speed and a lot of those
points become less relevant.

My conclusion is
that Bart is either quite immature or a narcissist.

I'd never bothered much about compile-speed in the past, except to
ensure that an edit-run cycle was usually a fraction of second, except
when I had to compile all modules of a project then it might have been a
few seconds.

My tools were naturally fast, even though unoptimised, through being
small and simple. It's only recently that I took advantage of that
through developing whole-program compilers.

This normally needs language support (eg. a decent module scheme).
Applying it to C is harder (if 50 modules of a project each use some
huge, 0.5Mloc header, then it means processing it 50 times).

I think it is possilble without changing the language, but decided it
wasn't worth the effort. I don't use it enough myself, and nobody else
seems to care.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Rosario19@3:633/280.2 to All on Sun Dec 1 03:35:42 2024

On Wed, 20 Nov 2024 12:31:35 -0000 (UTC), Dan Purgert wrote:

On 2024-11-16, Stefan Ram wrote:

Dan Purgert <dan@djph.net> wrote or quoted:

if (n==0) { printf ("n: %u\n",n); n++;}
if (n==1) { printf ("n: %u\n",n); n++;}
if (n==2) { printf ("n: %u\n",n); n++;}
if (n==3) { printf ("n: %u\n",n); n++;}
if (n==4) { printf ("n: %u\n",n); n++;}
printf ("all if completed, n=%u\n",n);

above should be equivalent to this

for(;n>=0&&n<5;++n) printf ("n: %u\n",n);
printf ("all if completed, n=%u\n",n);

My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!

I honestly lost the plot ages ago; not sure if it was either!

In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".

Segfaults? :D

Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.

#include <stdio.h>

const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }

void print_english( int const n )
{ printf( "%s\n", english( n )); }

int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }

oooh, that's way better at making a point of the hazard than mine was.

... almost needed to engage my rubber duckie, before I realized I was >mentally auto-correcting the 'english()' function while reading it.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Tim Rentsch@3:633/280.2 to All on Sun Dec 1 09:07:49 2024

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.11.2024 16:14, James Kuyper wrote:

On 11/16/24 04:42, Stefan Ram wrote:
...

[...]

#include <stdio.h>

const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }

That's indeed a nice example. Where you get fooled by treachery
"trustiness" of formatting[*]. - In syntax we trust! [**]

Misleading formatting is the lesser of two problems. A more
significant bad design choice is writing in an imperative
style rather than a functional style.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Sun Dec 1 23:41:03 2024

Stefan Ram <ram@zedat.fu-berlin.de> wrote:

My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!

In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".

Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.

#include <stdio.h>

const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }

void print_english( int const n )
{ printf( "%s\n", english( n )); }

int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }

That breaks two rules:
- instructions conditioned by 'if' should have braces,
- when we have the result we should return it immediately.

Once those are fixed code works as expected...

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Mon Dec 2 00:04:30 2024

Bart <bc@freeuk.com> wrote:

On 28/11/2024 12:37, Michael S wrote:

On Wed, 27 Nov 2024 21:18:09 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38

Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed.

My impression was that Bart is talking about machine-generated code.
For machine generated code 250Kloc is not too much.

This file mostly comprises sqlite3.c which is a machine-generated amalgamation of some 100 actual C files.

You wouldn't normally do development with that version, but in my
scenario, where I was trying to find out why the version built with my compiler was buggy, I might try adding debug info to it then building
with a working compiler (eg. gcc) to compare with.

Even in context of developing a compiler I would not run blindly
many compiliations of large file. At first stage I would debug
compiled program, to find out what is wrong with it. That normally
involves several runs of the same executable. Possible trick is
to compile each file separately and link files in various
combionations, some compiled by gcc, some by my compiler.
Normally that would locate error to a single file.

After that I would try to minimize the testcase, removing code which
do not contribute to the bug. That involves severla compilations
of files with quickly decreasing sizes.

Tim isn't asking the right questions (or any questions!). WHY does gcc
take so long to generate indifferent code when the task can clearly be
done at least a magnitude faster?

The simple answer is: users tolerate long compile time. If users
abandoned 'gcc' to some other compiler due to long compile time,
then 'gcc' developers would notice. But the opposite has happened:
'llvm' was significantly smaller and faster but produced slower code.
'llvm' developers improved optimizations in the process making
their compiler bigger and slower.

You need to improve your propaganda for faster C compilers...

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Mon Dec 2 00:19:54 2024

Bart <bc@freeuk.com> wrote:

On 30/11/2024 05:25, Tim Rentsch wrote:

Michael S <already5chosen@yahoo.com> writes:

On Wed, 27 Nov 2024 21:18:09 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Bart <bc@freeuk.com> writes:

On 26/11/2024 12:29, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 25/11/2024 18:49, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

It's funny how nobody seems to care about the speed of
compilers (which can vary by 100:1), but for the generated
programs, the 2:1 speedup you might get by optimising it is
vital!

I think most people would rather take this path (these times
are actual measured times of a recently written program):

compile time: 1 second
program run time: ~7 hours

than this path (extrapolated using the ratios mentioned above): >>>>>>>>
compile time: 0.01 second
program run time: ~14 hours

I'm trying to think of some computationally intensive app that
would run non-stop for several hours without interaction.

The conclusion is the same whether the program run time
is 7 hours, 7 minutes, or 7 seconds.

Funny you should mention 7 seconds. If I'm working on single
source file called sql.c for example, that's how long it takes for
gcc to create an unoptimised executable:

c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38

Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed.

My impression was that Bart is talking about machine-generated code.
For machine generated code 250Kloc is not too much. I would think
that in field of compiled-code HDL simulation people are interested
in compilation of as big sources as the can afford.

Sure. But Bart is implicitly saying that such cases make up the
bulk of C compilations, whereas in fact the reverse is true. People
don't care about Bart's complaint because the circumstances of his
examples almost never apply to them. And he must know this, even
though he tries to pretend he doesn't.

And of course you picked the farthest-most
outlier as your example, grossly misrepresenting any sort of
average or typical case.

I remember having much shorter file (core of 3rd-party TCP protocol
implementation) where compilation with gcc took several seconds.

Looked at it now - only 22 Klocs.
Text size in .o - 34KB.
Compilation time on much newer computer than the one I remembered, with
good SATA SSD and 4 GHz Intel Haswell CPU - a little over 1 sec. That
with gcc 4.7.3. I would guess that if I try gcc13 it would be 1.5 to 2
times longer.
So, in terms of Klock/sec it seems to me that time reported by Bart
is not outrageous. Indeed, gcc is very slow when compiling any source
several times above average size.
In this particular case I can not compare gcc to alternative, because
for a given target (Altera Nios2) there are no alternatives.

I'm not disputing his ratios on compilation speeds. I implicitly
agreed to them in my earlier remarks. The point is that the
absolute times are so small that most people don't care. For
some reason I can't fathom Bart does care, and apparently cannot
understand why most other people do not care. My conclusion is
that Bart is either quite immature or a narcissist. I have tried
to explain to him why other people think differently than he does,
but it seems he isn't really interested in having it explained.
Oh well, not my problem.

EVERYBODY cares about compilation speeds. Except in this newsgroup where people try to pretent that it's irrelevant.

But then at the same time, they strive to keep those compile-times small:

* By using tools that have themselves been optimised to reduce their runtimes, and where considerable resources have been expended to get the best possible code, which naturally also benefits the tool

* By using the fastest possible hardware

* By trying to do parallel builds across multiple cores

* By organising source code into artificially small modules so that recompilation of just one module is quicker. So, relying on independent compilation.

* By going to considerable trouble to define inter-dependencies between modules, so that a make system can AVOID recompiling modules. (Why on
earth would it need to? Oh, because it would be slower!)

* By using development techniques involving thinking deeply about what
to change, to avoid a costly rebuild.

Etc.

Those methods are effective and work. And one gets optimized
binaries as a result.

All instead of relying on raw compilation speed and a lot of those
points become less relevant.

If all other factors were the same, then using "better" compiler
would be nice. But other factors are not equal. You basically
advocate that people give up features that they want/need to
allow for simpler compilers, this is not going to happen.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Mon Dec 2 02:13:35 2024

On 01/12/2024 13:04, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

On 28/11/2024 12:37, Michael S wrote:

On Wed, 27 Nov 2024 21:18:09 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38

Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed.

My impression was that Bart is talking about machine-generated code.
For machine generated code 250Kloc is not too much.

This file mostly comprises sqlite3.c which is a machine-generated
amalgamation of some 100 actual C files.

You wouldn't normally do development with that version, but in my
scenario, where I was trying to find out why the version built with my
compiler was buggy, I might try adding debug info to it then building
with a working compiler (eg. gcc) to compare with.

Even in context of developing a compiler I would not run blindly
many compiliations of large file.

Difficult bugs always occur in larger codebases, but with C, these in a language that I can't navigate, and for programs which are not mine, and
which tend to be badly written, bristling with typedefs and macros.

It could take a week to track down where the error might be ...

At first stage I would debug
compiled program, to find out what is wrong with it.

.... within the C program. Except there's nothing wrong with the C
program! It works fine with a working compiler.

The problem will be in the generated code, so in an entirely different program. So normal debugging tools are useful when several sets of
source code are in involved, in different languages, or the error occurs
in the second generation version of either the self-hosted tool, or the program under test if it is to do with languages.

(For example, I got tcc.c working at one point. My generated tcc.exe
could compile tcc.c, but that second-generation tcc.c didn't work.)

After that I would try to minimize the testcase, removing code which
do not contribute to the bug.

Again, there is nothing wrong with the C program, but in the code
generated for it. The bug can be very subtle, but it usually turns out
to be something silly.

Removing code from 10s of 1000s of lines (or 250Kloc for sql) is not practical. But yet, the aim is to isolate some code which can be used to recreate the issue in a smaller program.

Debugging can involve comparing two versions, one working, the other
not, looking for differences. And here there may be tracking statements
added.

If the only working version is via gcc, then that's bad news because it
makes the process even more of a PITA.

I added an interpreter mode to my IL, because I assume that would give a solid, reliable reference implementation to compare against.

If turned out to be even more buggy than the generated native code!

(One problem was to do with my stdarg.h header which implements VARARGS
used in function definitions. It assumes the stack grows downwords. In
my interpreter, it grows downwards!)

That involves severla compilations
of files with quickly decreasing sizes.

Tim isn't asking the right questions (or any questions!). WHY does gcc
take so long to generate indifferent code when the task can clearly be
done at least a magnitude faster?

The simple answer is: users tolerate long compile time. If users
abandoned 'gcc' to some other compiler due to long compile time,
then 'gcc' developers would notice.

People use gcc. They come to depend on its features, or they might use (perhaps unknowingly) some extensions. On Windows, gcc includes some
headers and libraries that belong to Linux, but other compilers don't
provide them.

The result is that if they were to switch to a smaller, faster compiler,
their program may not work.

They'd have to use it from the start. But then they may want to use
libraries which only work with gcc ...

You need to improve your propaganda for faster C compilers...

I actually don't know why I care. I get the benefit of my fast tools
every day; they're a joy to use. So I'm not bothered that other people
are that tolerant of slow, cumbersome build systems.

But then, people in this group do like to belittle small, fast products
(tcc for example as well as my stuff), and that's where it gets annoying.

So, how long to build LLVM again? It used to be hours. Here's my take on
it being built from scratch:

c:\px>tm mm pc
Compiling pc.m to pc.exe
TM: 0.08

This standalone program takes a source file containing an IL program
rendered as text. It can create EXE, or run it or interpret it.

Let's try it out:

c:\cx>cc -p lua # compile a C program to IL
Compiling lua.c to lua.pcl

c:\cx>\px\pc -r lua fib.lua # Now compile and run it in-memory
Processing lua.pcl to lua.(run)
Running: fib.lua
1 1
2 1
3 2
4 3
5 5
6 8
7 13
...

Or I can interpret it:

c:\cx>\px\pc -i lua fib.lua
Processing lua.pcl to lua.(int)
Running: fib.lua
1 1
...

All that from a product that took 80ms to build and comprises a
self-contained 180KB executable.

If nobody here can appreciate the benefits of have such a baseline
product, then there's nothing I can do about that.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Mon Dec 2 02:34:24 2024

On 01.12.2024 13:41, Waldek Hebisch wrote:

Stefan Ram <ram@zedat.fu-berlin.de> wrote:

My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!

In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".

Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.

#include <stdio.h>

const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }

void print_english( int const n )
{ printf( "%s\n", english( n )); }

int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }

That breaks two rules:
- instructions conditioned by 'if' should have braces,

I suppose you don't mean

if (n == value) { result = string; }
else { result = other; }

which I'd think doesn't change anything. - So what is it?

Actually, you should just add explicit 'else' to fix the problem.
(Here there's no need to fiddle with spurious braces, I'd say.)

- when we have the result we should return it immediately.

This would suffice to fix it, wouldn't it?

Once those are fixed code works as expected...

I find this answer - not wrong, but - problematic for two reasons.
There's no accepted "general rules" that could get "broken"; it's
just rules that serve in given languages and application contexts.
And they may conflict with other "rules" that have been set up to
streamline code, make it safer, or whatever.

Janis

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Scott Lurndal@3:633/280.2 to All on Mon Dec 2 03:14:36 2024

Reply-To: slp53@pacbell.net

antispam@fricas.org (Waldek Hebisch) writes:

Stefan Ram <ram@zedat.fu-berlin.de> wrote:

My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!

In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".

Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.

#include <stdio.h>

const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }

void print_english( int const n )
{ printf( "%s\n", english( n )); }

int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }

That breaks two rules:
- instructions conditioned by 'if' should have braces,
- when we have the result we should return it immediately.

Three rules
- don't do something at runtime if you can do it at compile time.

const static char *english_numbers[] = { "zero", "one", "two", "three", "four" };
const static size_t num_english_numbers = sizeof(english_numbers)/sizeof(english_numbers[0]);

const char *english(const int n)
{
return (n < num_english_numbers) ? english_numbers[n] : "Out-of-range";
}

I was doing a code review just last week where a junior programmer had
to convert a small integer (0..5) to a text label, so the programmer creates a function to return the corrsponding label. That function creates a
std::map and initializes it with the set of text labels each time the function is called, just to discard the map after looking up the argument.

Needless to say, it didn't pass review.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Mon Dec 2 09:23:55 2024

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 01.12.2024 13:41, Waldek Hebisch wrote:

Stefan Ram <ram@zedat.fu-berlin.de> wrote:

My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!

In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".

Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.

#include <stdio.h>

const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }

void print_english( int const n )
{ printf( "%s\n", english( n )); }

int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }

That breaks two rules:
- instructions conditioned by 'if' should have braces,

I suppose you don't mean

if (n == value) { result = string; }
else { result = other; }

which I'd think doesn't change anything. - So what is it?

Actually, you should just add explicit 'else' to fix the problem.
(Here there's no need to fiddle with spurious braces, I'd say.)

Lack of braces is a smokescreen hiding the second problem.
Or to put if differently, due to lack of braces the code
immediately smells bad.

- when we have the result we should return it immediately.

This would suffice to fix it, wouldn't it?

Yes (but see above).

Once those are fixed code works as expected...

I find this answer - not wrong, but - problematic for two reasons.
There's no accepted "general rules" that could get "broken"; it's
just rules that serve in given languages and application contexts.
And they may conflict with other "rules" that have been set up to
streamline code, make it safer, or whatever.

No general rules, yes. But every sane programmer has _some_ rules.
My point was that if you adopt resonable rules, then whole classes
of potential problems go away.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Mon Dec 2 18:29:40 2024

On 01.12.2024 23:23, Waldek Hebisch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 01.12.2024 13:41, Waldek Hebisch wrote:

Stefan Ram <ram@zedat.fu-berlin.de> wrote:

My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!

In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".

Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.

#include <stdio.h>

const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }

void print_english( int const n )
{ printf( "%s\n", english( n )); }

int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }

That breaks two rules:
- instructions conditioned by 'if' should have braces,

I suppose you don't mean

if (n == value) { result = string; }
else { result = other; }

which I'd think doesn't change anything. - So what is it?

Actually, you should just add explicit 'else' to fix the problem.
(Here there's no need to fiddle with spurious braces, I'd say.)

Lack of braces is a smokescreen hiding the second problem.
Or to put if differently, due to lack of braces the code
immediately smells bad.

I know what you mean. Though since in the given example it's not
the braces that correct the code, and I also think that adding the
braces doesn't remove the "bad smell" (here). - YMMV, of course. -
For me the smell stems from the use of sequences of 'if' (instead
of 'switch'), and the lacking 'else' keywords. - Note that the OP's
original code *had* braces; it nevertheless had a "bad smell", IMO.

Spurious braces may even make the code less readable; so it depends.
And thus a "brace rule" can (IME) only be a "rule of thumb" and any
"codified rule" (see below) should reflect that.

- when we have the result we should return it immediately.

This would suffice to fix it, wouldn't it?

Yes (but see above).

Once those are fixed code works as expected...

I find this answer - not wrong, but - problematic for two reasons.
There's no accepted "general rules" that could get "broken"; it's
just rules that serve in given languages and application contexts.
And they may conflict with other "rules" that have been set up to
streamline code, make it safer, or whatever.

No general rules, yes. But every sane programmer has _some_ rules.
My point was that if you adopt resonable rules, then whole classes
of potential problems go away.

I associated the term "rule" with formal coding standards, so that
I wouldn't call personal coding habits "rules" but rather "rules of
thumb" (formal coding standards have both). But personal projects
(and programmers' habits) are anyway not my major concern, while
coding standards actually are. When you formulate coding standards
(and I've done that for a couple languages) you often have to walk
on the edge of what's possible and what's sensible.

Janis

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Tim Rentsch@3:633/280.2 to All on Tue Dec 3 01:09:27 2024

Bart <bc@freeuk.com> writes:

On 30/11/2024 05:25, Tim Rentsch wrote:

Michael S <already5chosen@yahoo.com> writes:

[...]

I remember having much shorter file (core of 3rd-party TCP protocol
implementation) where compilation with gcc took several seconds.

Looked at it now - only 22 Klocs.
Text size in .o - 34KB.
Compilation time on much newer computer than the one I remembered, with
good SATA SSD and 4 GHz Intel Haswell CPU - a little over 1 sec. That
with gcc 4.7.3. I would guess that if I try gcc13 it would be 1.5 to 2
times longer.
So, in terms of Klock/sec it seems to me that time reported by Bart
is not outrageous. Indeed, gcc is very slow when compiling any source
several times above average size.
In this particular case I can not compare gcc to alternative, because
for a given target (Altera Nios2) there are no alternatives.

I'm not disputing his ratios on compilation speeds. I implicitly
agreed to them in my earlier remarks. The point is that the
absolute times are so small that most people don't care. For
some reason I can't fathom Bart does care, and apparently cannot
understand why most other people do not care. My conclusion is
that Bart is either quite immature or a narcissist. I have tried
to explain to him why other people think differently than he does,
but it seems he isn't really interested in having it explained.
Oh well, not my problem.

EVERYBODY cares about compilation speeds. [...]

No, they don't. I accept that you care about compiler speed. What
most people care about is not speed but compilation times, and as
long as the times are small enough they don't worry about it.

Another difference may be relevant here. Based on other comments of
yours I have the impression that you frequently invoke compilations interactively. A lot of people never do that (or do it only very
rarely). In a project I am working on now I do builds often,
including full builds where every .c file is recompiled. But all
the compilation times together are only a small fraction of the
total, because doing a build includes lots of other steps, including
running regression tests. Even if the total compilation time were
zero the build process wouldn't be appreciably shorter.

I understand that you care about compiler speed, and that's fine
with me; more power to you. Why do you find it so hard to accept
that lots of other people have different views than you do, and
those people are not all stupid? Do you really consider yourself
the only smart person in the room?

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Tue Dec 3 01:44:46 2024

On 02/12/2024 14:09, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 30/11/2024 05:25, Tim Rentsch wrote:

EVERYBODY cares about compilation speeds. [...]

No, they don't. I accept that you care about compiler speed. What
most people care about is not speed but compilation times, and as
long as the times are small enough they don't worry about it.

Another difference may be relevant here. Based on other comments of
yours I have the impression that you frequently invoke compilations interactively. A lot of people never do that (or do it only very
rarely). In a project I am working on now I do builds often,
including full builds where every .c file is recompiled. But all
the compilation times together are only a small fraction of the
total, because doing a build includes lots of other steps, including
running regression tests. Even if the total compilation time were
zero the build process wouldn't be appreciably shorter.

But it might be appreciably longer if the compilers you used were a lot slower! Or needed to be invoked more. Then even you might start to care
about it.

You don't care because in your case it is not the bottleneck, and enough
work has been put into those compilers to ensure they are not even slower.

(I don't know why regression tests need to feature in every single build.)

I understand that you care about compiler speed, and that's fine
with me; more power to you. Why do you find it so hard to accept
that lots of other people have different views than you do, and
those people are not all stupid?

You might also accept that for many, compilation /is/ a bottleneck in
their work, or at least it introduces an annoying delay.

Or are you suggesting that the scenario portrayed here:

https://xkcd.com/303/

is a complete fantasy?

Do you really consider yourself
the only smart person in the room?

Perhaps the most impatient.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Tue Dec 3 05:19:48 2024

On 02.12.2024 15:44, Bart wrote:

On 02/12/2024 14:09, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 30/11/2024 05:25, Tim Rentsch wrote:

EVERYBODY cares about compilation speeds. [...]

No, they don't. I accept that you care about compiler speed. What
most people care about is not speed but compilation times, and as
long as the times are small enough they don't worry about it.

Another difference may be relevant here. Based on other comments of
yours I have the impression that you frequently invoke compilations
interactively. A lot of people never do that (or do it only very
rarely). In a project I am working on now I do builds often,
including full builds where every .c file is recompiled. But all
the compilation times together are only a small fraction of the
total, because doing a build includes lots of other steps, including
running regression tests. Even if the total compilation time were
zero the build process wouldn't be appreciably shorter.

Yes, a compiler is no interactive tool. (Even if some, or Bart, use it
that way.) I've also mentioned that upthread already.

I want to add that there's also other factors in professional projects
that makes absolute compilation times not the primary issue. Usually
we organize our code in modules, components, subsystems, etc.

The 'make' (or similar tools) will work on small subsets, results will (automatically) be part of a regression on unit-test level. Full builds
will require more time, but the results will be part of a higher-level
test (requiring yet more time).

It just makes little sense to only compile (a single file or a whole
project) if you don't at least test it.

But also if you omit the tests, the compile's results are typically
instantly available, since there's usually only few unit instances
compiled, where each is comparably small. In case one compiles mostly monolithic software he gets worse response-characteristics, of course.

Multiple compiles for the same thing, as Bart seem to employ, makes
sense to fix compile-time (coding-)errors after a significant amount
of code has changed. That's where habits get relevant; Bart said that
he likes the (IMO costly) piecewise incremental fix/compile cycles[*],
I understand that this way to work (with 'make' or triggered by hand)
will lead to observable delays. Since Bart will likely not change his
habits (or his code organization) the speed of a single compilation
is relevant to him. - There's thus nothing we have left to discuss.

[*] Were I (for example) prefer to fix, if not all, at least a larger
set of errors in one go.

But it might be appreciably longer if the compilers you used were a lot slower! Or needed to be invoked more. Then even you might start to care
about it.

You don't care because in your case it is not the bottleneck, and enough
work has been put into those compilers to ensure they are not even slower.

(I don't know why regression tests need to feature in every single build.)

Tests are optional, it doesn't need to be done "every time".

If all you want is to _sequentially_ process each single error in
a source file you don't need a test; all you need is to get the
error message, to start the editor, edit, and reiterate the compile
(to get the next error message, and so on). - Very time consuming.

But as soon as the errors are [all] fixed in a module... - what
do you do with it? - ...you should test that what you've changed
or implemented has been done correctly.

So edit/compile-iterating a single source is more time-consuming
than fixing it in, let's call it, "batch-mode". And once it's
error-free the compile times are negligible in the whole process.

I understand that you care about compiler speed, and that's fine
with me; more power to you. Why do you find it so hard to accept
that lots of other people have different views than you do, and
those people are not all stupid?

You might also accept that for many, compilation /is/ a bottleneck in
their work, or at least it introduces an annoying delay.

And there are various ways to address that.

Or are you suggesting that the scenario portrayed here:

https://xkcd.com/303/

is a complete fantasy?

It is a comic. - So, yes, it's fantasy. It's worth a scribbling
on a WC wall but not suited as a sensible base for discussions.

Do you really consider yourself
the only smart person in the room?

Perhaps the most impatient.

Don't count on that.

Janis

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Tue Dec 3 05:48:14 2024

On 02/12/2024 18:19, Janis Papanagnou wrote:

On 02.12.2024 15:44, Bart wrote:

If all you want is to _sequentially_ process each single error in
a source file you don't need a test; all you need is to get the
error message, to start the editor, edit, and reiterate the compile
(to get the next error message, and so on). - Very time consuming.

But as soon as the errors are [all] fixed in a module... - what
do you do with it? - ...you should test that what you've changed
or implemented has been done correctly.

So edit/compile-iterating a single source is more time-consuming
than fixing it in, let's call it, "batch-mode". And once it's
error-free the compile times are negligible in the whole process.

I've struggled to find a suitable real-life analogy.

All I can suggest is that people have gone to some lengths to justify
having a car that can only travel at 3 mph around town, rather then 30
mph (ie 5 vs 50 kph).

Maybe their town is only a village, so the net difference is neglible.
Or they rarely drive, or avoid doing so, another way to downplay the inconvenience of such slow wheels.

The fact is that driving at 3 mph on a clear road is incredibly
frustrating even when you're not in a hurry to get anywhere!

Or are you suggesting that the scenario portrayed here:

https://xkcd.com/303/

is a complete fantasy?

It is a comic. - So, yes, it's fantasy. It's worth a scribbling
on a WC wall but not suited as a sensible base for discussions.

I would disagree. The reason those work is that people can identify with
them from their own experience, even if exaggerated for comic effect.

Otherwise no would get them.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Tim Rentsch@3:633/280.2 to All on Thu Dec 5 12:34:59 2024

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 30.11.2024 05:40, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 30.11.2024 00:29, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 28/11/2024 17:28, Janis Papanagnou wrote:

But we're speaking about compilation times. [...]

You can make a similar argument about turning on the light switch
when entering a room. Flicking light switches is not something you
need to do every few seconds, but if the light took 5 seconds to
come on (or even one second), it would be incredibly annoying.

This analogy sounds like something a defense attorney would say who
has a client that everyone knows is guilty.

Intentionally or not; it's funny to respond to an analogy with an
analogy. :-}

My statement was not an analogy. Similar is not the same as
analogous.

It's of course (and obviously) not the same; it's just a
similar term where the semantics of both terms have an overlap.

(Not sure why you even bothered to reply and nit-pick here.

It's because you thought it was just a nit-pick that I bothered
to reply.

But with your habit you seem to have just missed the point;
the comparison of your reply-type with Bart's argumentation.)

If you think they are the same then it is you who has missed the
point.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Dan Purgert@3:633/280.2 to All on Thu Dec 5 21:51:51 2024

On 2024-11-30, Rosario19 wrote:

On Wed, 20 Nov 2024 12:31:35 -0000 (UTC), Dan Purgert wrote:

On 2024-11-16, Stefan Ram wrote:

Dan Purgert <dan@djph.net> wrote or quoted:

if (n==0) { printf ("n: %u\n",n); n++;}
if (n==1) { printf ("n: %u\n",n); n++;}
if (n==2) { printf ("n: %u\n",n); n++;}
if (n==3) { printf ("n: %u\n",n); n++;}
if (n==4) { printf ("n: %u\n",n); n++;}
printf ("all if completed, n=%u\n",n);

above should be equivalent to this

for(;n>=0&&n<5;++n) printf ("n: %u\n",n);
printf ("all if completed, n=%u\n",n);

Sure, but fir's original posting in
MID <3deb64c5b0ee344acd9fbaea1002baf7302c1e8f@i2pn2.org>

was a contrived sequence to the effect of
if (n==0) { //do something }
if (n==1) { //do something }
if (n==2) { //do something }
if (n==3) { //do something }
if (n==4) { //do something }

So, I merely took the contrived sequence, and made "do something" trip
each condition.

Stefan's example from a few posts back is better:

Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.

#include <stdio.h>

const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }

void print_english( int const n )
{ printf( "%s\n", english( n )); }

int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }

--
|_|O|_|
|_|_|O| Github: https://github.com/dpurgert
|O|O|O| PGP: DDAB 23FB 19FA 7D85 1CC1 E067 6D65 70E5 4CE7 2860

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Fri Dec 6 00:21:41 2024

On 05.12.2024 02:34, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 30.11.2024 05:40, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 30.11.2024 00:29, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 28/11/2024 17:28, Janis Papanagnou wrote:

But we're speaking about compilation times. [...]

You can make a similar argument about turning on the light switch
when entering a room. Flicking light switches is not something you >>>>>> need to do every few seconds, but if the light took 5 seconds to
come on (or even one second), it would be incredibly annoying.

This analogy sounds like something a defense attorney would say who
has a client that everyone knows is guilty.

Intentionally or not; it's funny to respond to an analogy with an
analogy. :-}

My statement was not an analogy. Similar is not the same as
analogous.

It's of course (and obviously) not the same; it's just a
similar term where the semantics of both terms have an overlap.

(Not sure why you even bothered to reply and nit-pick here.

It's because you thought it was just a nit-pick that I bothered
to reply.

But with your habit you seem to have just missed the point;
the comparison of your reply-type with Bart's argumentation.)

If you think they are the same then it is you who has missed the
point.

(After the nit-pick level you seem to have now reached the
Kindergarten niveau of communication. - And no substance as so
often in contexts where you cannot copy/paste a "C" standard
text passage.)

The point was; you were both making comparisons by expressing
similarities - "a similar argument" [Bart] and "sounds like"
[Tim]; you both expressed an opinion and backed that up by
formulating similarities; Bart (unnecessarily leaving the well
disputable IT context) by his light bulbs, any you (more on a
personal behavior level, unsurprisingly) comparing his habits
with [also a prejudice] other professions' habits (attorneys).

(Again, I wondered why you even bothered to reply. My original
reply wasn't even meant disrespectful; I was just amused. -
But meanwhile, given your response habits, I better ignore you
again, especially since you don't want to contribute but prefer
playing the troll.)

Janis

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Fri Dec 6 00:41:49 2024

On 02.12.2024 19:48, Bart wrote:

On 02/12/2024 18:19, Janis Papanagnou wrote:

On 02.12.2024 15:44, Bart wrote:

If all you want is to _sequentially_ process each single error in
a source file you don't need a test; all you need is to get the
error message, to start the editor, edit, and reiterate the compile
(to get the next error message, and so on). - Very time consuming.

But as soon as the errors are [all] fixed in a module... - what
do you do with it? - ...you should test that what you've changed
or implemented has been done correctly.

So edit/compile-iterating a single source is more time-consuming
than fixing it in, let's call it, "batch-mode". And once it's
error-free the compile times are negligible in the whole process.

I've struggled to find a suitable real-life analogy.

To argue in the topical domain is always better than making up
(typically non-fitting) real-life analogies.

(The same with your light-bulb analogy; I was inclined to answer
on that level, and could have even affirmed my point by it, but
decided that it's not the appropriate way to discuss the simple
processual issue, that I tried to explain you.)

All I can suggest is that people have gone to some lengths to justify
having a car that can only travel at 3 mph around town, rather then 30
mph (ie 5 vs 50 kph).

(You certainly meant km/h.)

Since you like analogies, let me tell you that I recently got
aware that on a city-highway(!) in my city they had introduced
a speed limit of 30 km/h (about 20mph); for reasons.

Maybe their town is only a village, so the net difference is neglible.
Or they rarely drive, or avoid doing so, another way to downplay the inconvenience of such slow wheels.

The fact is that driving at 3 mph on a clear road is incredibly
frustrating even when you're not in a hurry to get anywhere!

There are many more factors than frustration to be considered;
safety, pollution, noise, and optimal throughput, for example.
Similar as with development processes; if you have just one
factor (speed?) on your scale you might miss the overall goals.

(If you want to quickly get anywhere within the metropolitan
boundaries you just take the bicycle or the public transport
facilities. Just BTW. In other countries' cities there may be
other situations, preconditions and regulations.)

Janis

[...]

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Keith Thompson@3:633/280.2 to All on Fri Dec 6 10:51:54 2024

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 02.12.2024 19:48, Bart wrote:

[...]

All I can suggest is that people have gone to some lengths to justify
having a car that can only travel at 3 mph around town, rather then 30
mph (ie 5 vs 50 kph).

(You certainly meant km/h.)

Both "kph" and "km/h" are common abbreviations for "kilometers per
hour". Were you not familiar with "kph"?

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: None to speak of (3:633/280.2@fidonet)

From Tim Rentsch@3:633/280.2 to All on Fri Dec 6 11:24:10 2024

Bart <bc@freeuk.com> writes:

On 02/12/2024 14:09, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 30/11/2024 05:25, Tim Rentsch wrote:

EVERYBODY cares about compilation speeds. [...]

No, they don't. I accept that you care about compiler speed.
What most people care about is not speed but compilation times,
and as long as the times are small enough they don't worry about
it.

Another difference may be relevant here. Based on other comments
of yours I have the impression that you frequently invoke
compilations interactively. A lot of people never do that (or do
it only very rarely). In a project I am working on now I do
builds often, including full builds where every .c file is
recompiled. But all the compilation times together are only a
small fraction of the total, because doing a build includes lots
of other steps, including running regression tests. Even if the
total compilation time were zero the build process wouldn't be
appreciably shorter.

But it might be appreciably longer if the compilers you used were
a lot slower! Or needed to be invoked more. [...]

I concede your point. If things were different they wouldn't
be the same.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Waldek Hebisch@3:633/280.2 to All on Sat Dec 7 10:30:40 2024

Bart <bc@freeuk.com> wrote:

On 01/12/2024 13:04, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

On 28/11/2024 12:37, Michael S wrote:

On Wed, 27 Nov 2024 21:18:09 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38

Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed.

My impression was that Bart is talking about machine-generated code.
For machine generated code 250Kloc is not too much.

This file mostly comprises sqlite3.c which is a machine-generated
amalgamation of some 100 actual C files.

You wouldn't normally do development with that version, but in my
scenario, where I was trying to find out why the version built with my
compiler was buggy, I might try adding debug info to it then building
with a working compiler (eg. gcc) to compare with.

Even in context of developing a compiler I would not run blindly
many compiliations of large file.

Difficult bugs always occur in larger codebases, but with C, these in a language that I can't navigate, and for programs which are not mine, and which tend to be badly written, bristling with typedefs and macros.

It could take a week to track down where the error might be ...

It could be. You could declare that the program is hopeless or do
what is needed. Which frequently means effectively using available
debugging features. For example, I got strange crash. Looking at
data in the debugger suggested that data is malformed. So I used
data breakpoints to figure out which instruction initialized the data.
That needed several runs of the program, in each run looking what
happened to suspected memory location. At the end I localized the
problem and rest was easy.

Some problems are easy, for example significat percentage of
segfaults: you have something which is not a valid address
ad freqently you immediatly see why the address is wrong and
how to fix this. Still, finding this usually takes longer
than compilation.

At first stage I would debug
compiled program, to find out what is wrong with it.

... within the C program. Except there's nothing wrong with the C
program! It works fine with a working compiler.

The problem will be in the generated code, so in an entirely different program.

Of course problem is in the generated code. But debug info (I had
at least _some_ debug info, apparently you do not have it) shows you
which part of source is responsible for given machine code. And you
can see data, so can see what is happening in the generated program.
And you have C source so you can see what should happen. Once
you know place where "what is happening" differs from "what should
happen" you normally can produce quite small reproducing example.

So normal debugging tools are useful when several sets of
source code are in involved, in different languages, or the error occurs
in the second generation version of either the self-hosted tool, or the program under test if it is to do with languages.

(For example, I got tcc.c working at one point. My generated tcc.exe
could compile tcc.c, but that second-generation tcc.c didn't work.)

Clear, you work in stages: first you find out what is wrong with second-generation tcc.exe. Then you find out piece of tcc.c that was miscompiled by first generation tcc.exe (producing wrong second
generation compiler). Then you find piece of tcc.c which was
responsible for this miscompilation. And finally you look why
your compiler miscompiled this piece of tcc.c.

Tedius, yes. It is easier if you have good testsuite, that is
collection of small programs that excercise various constructs
and potentially problematic combinations.

Anyway, most of the work involves executing programs in debugger
and observing critical things. Re-creating executables is rare
in comparison. Main point where compiler speed matters is time
to run compiler testsuite.

After that I would try to minimize the testcase, removing code which
do not contribute to the bug.

Again, there is nothing wrong with the C program, but in the code
generated for it. The bug can be very subtle, but it usually turns out
to be something silly.

Removing code from 10s of 1000s of lines (or 250Kloc for sql) is not practical. But yet, the aim is to isolate some code which can be used to recreate the issue in a smaller program.

If you have "good" version (say one produced by 'gcc' or by earlier
worong verion of your compiler), then you can isolate problem by
linking parts produced by different compilers. Even if you have
one huge file, typically you can split it into parts (if it is one
huge function normally it is possible to split it into smaller
ones). Yes, it is work but getting quality product needs work.

Debugging can involve comparing two versions, one working, the other
not, looking for differences. And here there may be tracking statements added.

If the only working version is via gcc, then that's bad news because it makes the process even more of a PITA.

Well, IME tracking statements frequently produce too much or too little
data. When dealing with C code I tend to depend more on debugger,
setting breakpoints in crucial places and examing data there. Extra
printing functions can help, for example gcc has printing functions
for its main data structures. Such functions can be called from
debugger and give nicer output than generic debugger functions.
But even if you need extra printiong functions you can put them
in separate file, compile once and use multiple times.

I added an interpreter mode to my IL, because I assume that would give a solid, reliable reference implementation to compare against.

If turned out to be even more buggy than the generated native code!

(One problem was to do with my stdarg.h header which implements VARARGS
used in function definitions. It assumes the stack grows downwords.

This is true on most machines, but not all.

In
my interpreter, it grows downwards!)

You probably meant upwards? And handling such things is natural
when you have portablity in mind, either you parametrise stdarg.h
so that it works for both stack directions, or you make sure that
interpreter and compiler use the same direction (the later seem to
be much easier). Actually, I think that most natural way is to
have data structure layout in the interpreter to be as close as
possible to compiler data layout. Of course, there are some
unavoidable differences, interpreter needs registers for its operation
so some variables that could be in registers in compiled code
will end in stack frame.

That involves severla compilations
of files with quickly decreasing sizes.

Tim isn't asking the right questions (or any questions!). WHY does gcc
take so long to generate indifferent code when the task can clearly be
done at least a magnitude faster?

The simple answer is: users tolerate long compile time. If users
abandoned 'gcc' to some other compiler due to long compile time,
then 'gcc' developers would notice.

People use gcc. They come to depend on its features, or they might use (perhaps unknowingly) some extensions. On Windows, gcc includes some
headers and libraries that belong to Linux, but other compilers don't provide them.

The result is that if they were to switch to a smaller, faster compiler, their program may not work.

They'd have to use it from the start. But then they may want to use libraries which only work with gcc ...

Well, you see that there are reasons to use 'gcc'. Long ago I
produced image processing DLL for Windows. First version was
developed on Linux using 'gcc' and then compiled on Windows
using Borland C. It turned out that in Borland C 'setjmp/longjmp'
did not work, so I had to work around this. Not nice, but
managable. At that time C standard did not include function
to round floats to integers and that proved to be problematic.
C default, that is truncation produced artifacts that were not
acceptable. So I used emulation of rounding based on 'floor',
that worked OK, but turned out to be slow (something like 70%
of runtime went into rounding). So I replaced this by assembler
code. With Borland C I had to call a separate assembler routine,
which had some overhead.

Next version was cross-compiled on Linux using gcc. This version
used inline assembly for rounding and was significantly faster
than what Borland C produced. Note: images to process were
largish (think of say 12000 by 20000 pixels) and speed was
important factor. So using 'gcc' specific code was IMO justified
(this code was used conditionally, other compilers would get
slow portable version using 'floor').

You need to improve your propaganda for faster C compilers...

I actually don't know why I care. I get the benefit of my fast tools
every day; they're a joy to use. So I'm not bothered that other people
are that tolerant of slow, cumbersome build systems.

But then, people in this group do like to belittle small, fast products
(tcc for example as well as my stuff), and that's where it gets annoying.

I tried tcc compiling TeX. Long ago it did not work due to limitations
of tcc. This time it worked. Small comparison on main file (19062
lines):

Command time size code size data
tcc -g 0.017 290521 1188
tcc 0.015 290521 1188
gcc -O0 -g 0.440 248467 14
gcc -O0 0.413 248467 14
gcc -O -g 1.385 167565 0
gcc -O 1.151 167565 0
gcc -Os -g 1.998 142336 0
gcc -Os 1.724 142336 0
gcc -O2 -g 2.683 207913 0
gcc -O2 2.257 207913 0
gcc -O3 -g 3.510 255909 0
gcc -O3 2.934 255909 0
clang -O0 -g 0.302 232755 14
clang -O0 0.189 232755 14
clang -O -g 1.996 223410 0
clang -O 1.683 223410 0
clang -Os -g 1.693 154421 0
clang -Os 1.451 154421 0
clang -O2 -g 2.774 259569 0
clang -O2 2.359 259569 0
clang -O3 -g 2.970 280235 0
clang -O3 2.537 280235 0

I have dully provided both time when using '-g' and without.
Both are supposed to produce the same code (so also code
and data sizes are the same), but you can see that '-g'
measurably increases compile time. AFAIK compiler data
structures contain slots for debug info even if '-g' is
not given and compiler generates no debug info. So
actial cost of supporting '-g' is higher than the difference,
you pay part of this cost even if you do not use the
capability.

ATM I do not have data handy to compare runtimes (TeX needs
extra data to do uesful work), so I provide code and data
size as a proxy. As you can see even at -O0 gcc and clang
manage to put almost all data into istructions (actually
in tex.c _all_ intialized data is constant), while tcc
keeps it as data which requires extra instructions to
access. gcc at -O and -Os and clang at -Os produce code
which is about half of size of tcc result. Some part
of it may be due to using smaller instructions, but most
is likely because gcc and clang results simply have much
less instructions. At higher optimization level code
size grows, this is probably due to inlining and code
duplication. This usually gives some small speedup at
cost of bigger code, but one would have to measure
(sometimes attempts at optimization backfire and lead
to slower code).

Anyway, 19062 lines is much larger than typical file that
I work with and even for such size compile time is reasonable.
Maybe less typical is modest use of include files, tex.c
uses few standard C headers and 1613 lines of project-specific
headers. Still, there are macros and macro-expanded result
is significantly bigger than the source.

In the past TeX execution time correlated reasonably well with
Dhrystone. On Dhrystone tcc compiled code is about 4 times
slower than gcc/lang, so one can expect tcc compiled TeX to
be significantly slower than one compiled by gcc or clang.

--
Waldek Hebisch

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: To protect and to server (3:633/280.2@fidonet)

From Janis Papanagnou@3:633/280.2 to All on Sat Dec 7 21:58:49 2024

On 06.12.2024 00:51, Keith Thompson wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 02.12.2024 19:48, Bart wrote:

[...]

All I can suggest is that people have gone to some lengths to justify
having a car that can only travel at 3 mph around town, rather then 30
mph (ie 5 vs 50 kph).

(You certainly meant km/h.)

Both "kph" and "km/h" are common abbreviations for "kilometers per
hour". Were you not familiar with "kph"?

No. Must be a convention depending on cultural context of locality.
("kph", if anything, is "kilopond-hour, per standard.)

So thanks for pointing that out. (I forget sometimes that in some
countries there's a reluctance using the [established] standards,
and I certainly don't know about all the cultural peculiarities of
the [many] existing countries, even if they are as dominating as
the USA is [or other English speaking or influenced countries].)

We're used to the SI units and metric form, although hereabouts
some folks also (informally, but wrongly) pronounce it as "k-m-h".

Janis

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

From Bart@3:633/280.2 to All on Sat Dec 7 23:40:57 2024

On 06/12/2024 23:30, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

(For example, I got tcc.c working at one point. My generated tcc.exe
could compile tcc.c, but that second-generation tcc.c didn't work.)

Clear, you work in stages: first you find out what is wrong with second-generation tcc.exe.

Ha, ha, ha!

While C /can/ written reasonably clearly, tcc sources are more typical.
Very dense, mixed-up lower and upper case everywhere, apparent over-use
of macros, eq:

for_each_elem(symtab_section, 1, sym, ElfW(Sym)) {
if (sym->st_shndx == SHN_UNDEF) {
name = (char *) symtab_section->link->data + sym->st_name;
sym_index = find_elf_sym(s1->dynsymtab_section, name);

If I was looking to develop this product then it might be worth spending
days or weeks learning how it all works. But it's not worth mastering
this codebase inside out just to discover I wrote 0 instead of 1
somewhere in my compiler.

I need whatever error it is to manifest itself in a simpler way. Or have
two versions (eg. one interpreted the other native code) that give
different results. The problem with this app is that those different
results appear too far down the line; I don't want to trace a billion instructions first.

So, when I get back to it, I'll test other open source C code. (The
annoying thing though is that either it won't compile for reasons I've
lost interest in, or it works completely fine.)

In
my interpreter, it grows downwards!)

You probably meant upwards?

Yes.

And handling such things is natural
when you have portablity in mind, either you parametrise stdarg.h
so that it works for both stack directions, or you make sure that
interpreter and compiler use the same direction (the later seem to
be much easier).

This is quite a tricky one actually. There is currently conditional code
in my stdarg.h that detects whether the compiler has set a flag saying
result will be interpreted. But it doesn't always know that.

For example, the compiler might be told to do -E (preprocess) and the
result compiled later. The stack direction is baked into the output.

Or it will do -p (generate discrete IL), where it doesn't know whether
that will be interpreted.

But this is not a serious issue; the interpreted option is for either debugging or novelty uses.

Actually, I think that most natural way is to
have data structure layout in the interpreter to be as close as
possible to compiler data layout.

I don't want my hand forced in this. The point of interpreting is to be independent of hardware. A downward growing stack is unnatural.

They'd have to use it from the start. But then they may want to use
libraries which only work with gcc ...

Well, you see that there are reasons to use 'gcc'.

Self-perpetuating ones, which are the wrong reasons.

Next version was cross-compiled on Linux using gcc. This version
used inline assembly for rounding and was significantly faster
than what Borland C produced. Note: images to process were
largish (think of say 12000 by 20000 pixels) and speed was
important factor. So using 'gcc' specific code was IMO justified
(this code was used conditionally, other compilers would get
slow portable version using 'floor').

I have a little image editor written entirely in interpreted code. (It
was supposed to a project that was mixed language, but that's some way off.)

However it is just about usable. Eg. inverting the colours (negative to positive etc) of a 6Mpix colour image takes 1/8th of a second. Splitting
into separate R,G,B 8-bit planes takes half a second. This is with
bytecode working on a pixel at a time.

It uses no optimised code in the interpreter. Only a mildly accelerated dispatcher.

You need to improve your propaganda for faster C compilers...

I actually don't know why I care. I get the benefit of my fast tools
every day; they're a joy to use. So I'm not bothered that other people
are that tolerant of slow, cumbersome build systems.

But then, people in this group do like to belittle small, fast products
(tcc for example as well as my stuff), and that's where it gets annoying.

I tried tcc compiling TeX. Long ago it did not work due to limitations
of tcc. This time it worked. Small comparison on main file (19062
lines):

Command time size code size data
tcc -g 0.017 290521 1188
tcc 0.015 290521 1188
gcc -O0 -g 0.440 248467 14
gcc -O0 0.413 248467 14

This is demonstrating that tcc is translating C code at over 1 million
lines per second, and generating binary code at 17MB per second. You're
not impressed by that?

Here are a couple of reasonably substantial one-file programs that can
be run, both interpreters:

https://github.com/sal55/langs/blob/master/lua.c

This is a one-file Lua interpreter, which I modified to take input from
a file. (For original, see comment at start.)

On my machine, these are typical results:

gcc -s -O3 14 secs 378KB 3.0 secs (compile-time, size, runtime)
gcc -s -O0 3.3 secs 372KB 10.0 secs
tcc 0.12 secs 384KB 8.5 secs
cc 0.14 secs 315KB 8.3 secs

The runtime refers to running this Fibonacci test (fib.lua):

function fibonacci(n)
if n<3 then
return 1
else
return fibonacci(n-1) + fibonacci(n-2)
end
end

for n = 1, 36 do
f=fibonacci(n)
io.write(n," ",f, "\n")
end

The one is a version of my interpreter, minus ASM acceleration,
transpiled to C, and for Linux:

https://github.com/sal55/langs/blob/master/qc.c

Compile using for example:

gcc qc.c -oqc -fno-builtin -lm -ldl
tcc qc.c -oqc -fdollars-in-identifiers -lm -ldl

The input there can be (fib.q):

func fib(n)=
if n<3 then
1
else
fib(n-1)+fib(n-2)
fi
end

for i to 36 do
println i,fib(i)
od

Run like this:

./qc -nosys fib

On my Windows machine, gcc-O3-compiled version takes 4.1 seonds, and tcc
is 9.3 seconds. It's narrower than the Lua version which uses a C style
that depends more on function inlining. (Note that being in one file,
allows gcc to do whole-program optimisations.)

My cc-compiled version runs in 5.1 seconds, so only 25% slower than
gcc-O3. It also produces a 360KB executable, compared with gcc's 467KB,
even with -s. tcc's code is about the same as gcc-O3.

(My cc-compiler doesn't yet have the optimising pass that makes code
smaller. The original source qc project, builds to 266KB with that pass enabled, while gcc's -Os on qc.c manages 280KB.

But my 266KB version runs faster than gcc's 280KB! And accelerated code
runs 5 times as fast. (6 secs vs 1.22 secs.)

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (3:633/280.2@fidonet)

Who's Online
Recent Visitors
- Guest
  Fri Dec 20 21:00:31 2024
  from Ro via Telnet
- Guest
  Tue Dec 24 13:33:42 2024
  from System via Raw
- Guest
  Wed Dec 25 12:55:10 2024
  from System via Raw
- Guest
  Thu Dec 26 13:49:51 2024
  from System via Raw

System Info

Sysop:	Tetrazocine
Location:	Melbourne, VIC, Australia
Users:	4
Nodes:	8 (0 / 8)
Uptime:	59:45:16
Calls:	65
Files:	21,500
Messages:	73,572

Re: else ladders practice

Who's Online

Recent Visitors

System Info