fir <fir@grunge.pl> writes:
Tim Rentsch wrote:
fir <fir@grunge.pl> writes:
Bart wrote:
ral clear patterns here: you're testing the same variable 'n'
against several mutually exclusive alternatives, which also happen
to be consecutive values.
C is short of ways to express this, if you want to keep those
'somethings' as inline code (otherwise arrays of function pointers
or even label pointers could be use
so in short this groupo seem to have no conclusion but is tolerant
foir various approaches as it seems
imo the else latder is like most proper but i dont lkie it
optically, swich case i also dont like (use as far i i remember
never in my code, for years dont use even one)
so i persnally would use bare ifs and maybe elses ocasionally
(and switch should be mended but its fully not clear how,
I think you should have confidence in your own opinion. All
you're getting from other people is their opinion about what is
easier to understand, or "clear", or "readable", etc. As long as
the code is logically correct you are free to choose either
style, and it's perfectly okay to choose the one that you find
more appealing.
There is a case where using 'else' is necessary, when there is a
catchall action for circumstances matching "none of the above".
Alternatively a 'break' or 'continue' or 'goto' or 'return' may
be used to bypass subsequent cases, but you get the idea.
With the understanding that I am offering more than my own opinion,
I can say that I might use any of the patterns mentioned, depending
on circumstances. I don't think any one approach is either always
right or always wrong.
maybe, but some may heve some strong arguments (for use this and not
that) i may overlook
I acknowledge the point, but you haven't gotten any arguments,
only opinions.
fir <fir@grunge.pl> writes:
Tim Rentsch wrote:
With the understanding that I am offering more than my own opinion,
I can say that I might use any of the patterns mentioned, depending
on circumstances. I don't think any one approach is either always
right or always wrong.
maybe, but some may heve some strong arguments (for use this and not
that) i may overlook
I acknowledge the point, but you haven't gotten any arguments,
only opinions.
[...]
Here, the question was, can:
if (c1) s1;
else if (c2) s2;
always be rewritten as:
if (c1) s1;
if (c2) s2;
[...]
On 04.11.2024 12:56, Bart wrote:
[...]
Here, the question was, can:
if (c1) s1;
else if (c2) s2;
always be rewritten as:
if (c1) s1;
if (c2) s2;
Erm, no. The question was even more specific.
It had (per example)
not only all ci disjunct but also defined as a linear sequence of
natural numbers! - In other languages [than "C"] this may be more
important since [historically] there were specific constructs for
that case; see e.g. 'switch' definitions in Simula, or the 'case'
statement of Algol 68, both mapping elements onto an array[1..N];
labels in the first case, and expressions in the latter case. So
in "C" we could at least consider using something similar, like,
say, arrays of function pointers indexed by those 'n'.
I'd suggest that by just pointing it out.)
I'm a bit astonished, BTW, about this huge emphasis on the topic
"opinions" in later posts of this thread. The OP asked (even in
the subject) about "practice" which actually invites if not asks
for providing opinions (besides practical experiences).
[...] As long as
the code is logically correct you are free to choose either
style, and it's perfectly okay to choose the one that you find
more appealing.
That the OP's example contained some clear patterns has already been
covered (I did so anyway).
On 04/11/2024 04:00, Tim Rentsch wrote:
fir <fir@grunge.pl> writes:
Tim Rentsch wrote:
With the understanding that I am offering more than my own opinion,
I can say that I might use any of the patterns mentioned, depending
on circumstances. I don't think any one approach is either always
right or always wrong.
maybe, but some may heve some strong arguments (for use this and not
that) i may overlook
I acknowledge the point, but you haven't gotten any arguments,
only opinions.
Pretty much everything about PL design is somebody's opinion.
Bart wrote:
On 04/11/2024 04:00, Tim Rentsch wrote:
fir <fir@grunge.pl> writes:
Tim Rentsch wrote:
With the understanding that I am offering more than my own opinion,
I can say that I might use any of the patterns mentioned, depending
on circumstances. I don't think any one approach is either always
right or always wrong.
maybe, but some may heve some strong arguments (for use this and not
that) i may overlook
I acknowledge the point, but you haven't gotten any arguments,
only opinions.
Pretty much everything about PL design is somebody's opinion.
overally when you think and discuss such thing some conclusions may do
appear - and often soem do for me, though they are not always very clear
or 'hard'
overally from this thread i noted that switch (which i already dont
liked) is bad.. note those two elements of switch it is "switch"
and "Case" are in weird not obvious relation in c (and what will it
work when you mix it etc)
what i concluded was than if you do thing such way
a { } //this is analogon to case - named block
b { } //this is analogon to case - named block
n() // here by "()" i noted call of some wariable that mey yeild
'call' to a ,b, c, d, e, f //(in that case na would be soem enum or
pointer)
c( ) //this is analogon to case - named block
d( ) //this is analogon to case - named block
then everything is clear - this call just selects and calls block , and
block itself are just definitions and are skipped in execution until
"called"
this is example of some conclusion for me from thsi thread - and i think
such codes as this my own initial example should be probably done such
way (though it is not c, i know
fir wrote:
Bart wrote:note in fact both array usage like tab[5] and fuunction call like foo()
On 04/11/2024 04:00, Tim Rentsch wrote:
fir <fir@grunge.pl> writes:
Tim Rentsch wrote:
With the understanding that I am offering more than my own opinion, >>>>>> I can say that I might use any of the patterns mentioned, depending >>>>>> on circumstances.ÿ I don't think any one approach is either always >>>>>> right or always wrong.
maybe, but some may heve some strong arguments (for use this and not >>>>> that) i may overlook
I acknowledge the point, but you haven't gotten any arguments,
only opinions.
Pretty much everything about PL design is somebody's opinion.
overally when you think and discuss such thing some conclusions may do
appear - and often soem do for me, though they are not always very clear
or 'hard'
overally from this thread i noted that switch (which i already dont
liked) is bad.. note those two elements of switch it is "switch"
and "Case" are in weird not obvious relation in c (and what will it
work when you mix it etc)
what i concluded was than if you do thing such way
a {ÿ }ÿ //this is analogon to case - named block
b {ÿ }ÿ //this is analogon to case - named block
n()ÿÿ // here by "()" i noted call of some wariable that mey yeild
'call' to a ,b, c, d, e, fÿ //(in that case na would be soem enum or
pointer)
c(ÿ ) //this is analogon to case - named block
d(ÿ ) //this is analogon to case - named block
then everything is clear - this call just selects and calls block , and
block itself are just definitions and are skipped in execution until
"called"
this is example of some conclusion for me from thsi thread - and i think
such codes as this my own initial example should be probably done such
way (though it is not c, i know
are analogues to swich case - as when you call fuctions the call is like switch and function definition sets are 'cases'
Bart wrote:
On 04/11/2024 04:00, Tim Rentsch wrote:
fir <fir@grunge.pl> writes:
Tim Rentsch wrote:
With the understanding that I am offering more than my own opinion,
I can say that I might use any of the patterns mentioned, depending
on circumstances. I don't think any one approach is either always
right or always wrong.
maybe, but some may heve some strong arguments (for use this and not
that) i may overlook
I acknowledge the point, but you haven't gotten any arguments,
only opinions.
Pretty much everything about PL design is somebody's opinion.
overally when you think and discuss such thing some conclusions may do
appear - and often soem do for me, though they are not always very clear
or 'hard'
overally from this thread i noted that switch (which i already dont
liked) is bad.. note those two elements of switch it is "switch"
and "Case" are in weird not obvious relation in c (and what will it
work when you mix it etc)
what i concluded was than if you do thing such way
a { } //this is analogon to case - named block
b { } //this is analogon to case - named block
n() // here by "()" i noted call of some wariable that mey yeild
'call' to a ,b, c, d, e, f //(in that case na would be soem enum or
pointer)
c( ) //this is analogon to case - named block
d( ) //this is analogon to case - named block
then everything is clear - this call just selects and calls block , and
block itself are just definitions and are skipped in execution until
"called"
this is example of some conclusion for me from thsi thread - and i think
such codes as this my own initial example should be probably done such
way (though it is not c, i know
On 04/11/2024 15:06, fir wrote:
fir wrote:
Bart wrote:note in fact both array usage like tab[5] and fuunction call like foo()
On 04/11/2024 04:00, Tim Rentsch wrote:
fir <fir@grunge.pl> writes:
Tim Rentsch wrote:
With the understanding that I am offering more than my own opinion, >>>>>>> I can say that I might use any of the patterns mentioned, depending >>>>>>> on circumstances. I don't think any one approach is either always >>>>>>> right or always wrong.
maybe, but some may heve some strong arguments (for use this and not >>>>>> that) i may overlook
I acknowledge the point, but you haven't gotten any arguments,
only opinions.
Pretty much everything about PL design is somebody's opinion.
overally when you think and discuss such thing some conclusions may do
appear - and often soem do for me, though they are not always very clear >>> or 'hard'
overally from this thread i noted that switch (which i already dont
liked) is bad.. note those two elements of switch it is "switch"
and "Case" are in weird not obvious relation in c (and what will it
work when you mix it etc)
what i concluded was than if you do thing such way
a { } //this is analogon to case - named block
b { } //this is analogon to case - named block
n() // here by "()" i noted call of some wariable that mey yeild
'call' to a ,b, c, d, e, f //(in that case na would be soem enum or
pointer)
c( ) //this is analogon to case - named block
d( ) //this is analogon to case - named block
then everything is clear - this call just selects and calls block , and
block itself are just definitions and are skipped in execution until
"called"
this is example of some conclusion for me from thsi thread - and i think >>> such codes as this my own initial example should be probably done such
way (though it is not c, i know
are analogues to swich case - as when you call fuctions the call is
like switch and function definition sets are 'cases'
Yes, switch could be implemented via a table of label pointers, but it
needs a GNU extension.
For example this switch:
#include <stdio.h>
int main(void) {
for (int i=0; i<10; ++i) {
switch(i) {
case 7: case 2: puts("two or seven"); break;
case 5: puts("five"); break;
default: puts("other");
}
}
}
Could also be written like this:
#include <stdio.h>
int main(void) {
void* table[] = {
&&Lother, &&Lother, &&L27, &&Lother, &&Lother, &&L5,
&&Lother, &&L27, &&Lother, &&Lother};
for (int i=0; i<10; ++i) {
goto *table[i];
L27: puts("two or seven"); goto Lend;
L5: puts("five"); goto Lend;
Lother: puts("other");
Lend:;
}
}
(A compiler may generate something like this, although it will be range-checked if need. In practice, small numbers of cases, or where the
case values are too spread out, might be implemented as if-else chains.)
On 03/11/2024 17:00, David Brown wrote:
On 02/11/2024 21:44, Bart wrote:
I would disagree on that definition, yes.ÿ A "multi-way selection"
would mean, to me, a selection of one of N possible things - nothing
more than that.ÿ It is far too general a phrase to say that it must
involve branching of some sort ("notional" or otherwise).
Not really. If the possible options involving actions written in-line,
and you only want one of those executed, then you need to branch around
the others!
ÿ And it is too general to say if you are selecting one of many things
to do, or doing many things and selecting one.
Sorry, but this is the key part. You are not evaluating N things and selecting one; you are evaluating ONLY one of N things.
For X, it builds a list by evaluating all the elements, and returns the value of the last. For Y, it evaluates only ONE element (using internal switch, so branching), which again is the last.
You don't seem keen on keeping these concepts distinct?
The whole construct may or may not return a value. If it does, then
one of the N paths must be a default path.
No, that is simply incorrect.ÿ For one thing, you can say that it is
perfectly fine for the selection construct to return a value sometimes
and not at other times.
How on earth is that going to satisfy the type system? You're saying
it's OK to have this:
ÿÿ int x = if (randomfloat()<0.5) 42;
Or even this, which was discussed recently, and which is apparently
valid C:
ÿÿ int F(void) {
ÿÿÿÿÿÿ if (randomfloat()<0.5) return 42;
In the first example, you could claim that no assignment takes place
with a false condition (so x contains garbage). In the second example,
what value does F return when the condition is false?
You can't hide behind your vast hyper-optimising compiler; the language needs to say something about it.
My language will not allow it. Most people would say that that's a good thing. You seem to want to take the perverse view that such code should
be allowed to return garbage values or have undefined behaviour.
After all, this is C! But please tell me, what would be the downside of
not allowing it?
ÿ It's fine if it never returns at all for some
cases.ÿ It's fine to give selection choices for all possible inputs.
It's fine to say that the input must be a value for which there is a
choice.
What I see here is that you don't like C's constructs (that may be for
good reasons, it may be from your many misunderstandings about C, or
it may be from your knee-jerk dislike of everything C related).
With justification. 0010 means 8 in C? Jesus.
It's hardly knee-jerk either since I first looked at it in 1982, when my
own language barely existed. My opinion has not improved.
ÿ You have some different selection constructs in your own language,
which you /do/ like.ÿ (It would be very strange for you to have
constructs that you don't like in your own personal one-man language.)
It's a one-man language but most of its constructs and features are universal. And therefore can be used for comparison.
One feature of my concept of 'multi-way select' is that there is one
or more controlling expressions which determine which path is followed.
Okay, that's fine for /your/ language's multi-way select construct.
But other people and other languages may do things differently.
FGS, /how/ different? To select WHICH path or which element requires
some input. That's the controlling expression.
Or maybe with your ideal language, you can select an element of an array without bothering to provide an index!
There are plenty of C programmers - including me - who would have
preferred to have "switch" be a more structured construct which could
not be intertwined with other constructs in this way.ÿ That does not
mean "switch" is not clearly defined - nor does it hinder almost every
real-world use of "switch" from being reasonably clear and structured.
It does, however, /allow/ people to use "switch" in more complex and
less clear ways.
Try and write a program which takes any arbitrary switch construct (that usually means written by someone else, because obviously all yours will
be sensible), and cleanly isolates all the branches including the
default branch.
Hint: the lack of 'break' in a non-empty span between two case labels
will blur the line. So will a conditional break (example below unless
it's been culled).
You are confusing "this makes it possible to write messy code" with a
belief that messy code is inevitable or required.ÿ And you are
forgetting that it is always possible to write messy or
incomprehensible code in any language, with any construct.
I can't write that randomfloat example in my language.
I can't leave out
a 'break' in a switch statement (it's not meaningful). It is impossible
to do the crazy things you can do with switch in C.
Yes, with most languages you can write nonsense programs, but that
doesn't give the language a licence to forget basic rules and common
sense, and just allow any old rubbish even if clearly wrong:
ÿÿ int F() {
ÿÿÿÿÿÿ F(1, 2.3, "four", F,F,F,F(),F(F()));
ÿÿÿÿÿÿ F(42);
ÿÿ }
This is apparently valid C. It is impossible to write this in my language.
You can't use such a statement as a solid basis for a multi-way
construct that returns a value, since it is, in general, impossible
to sensibly enumerate the N branches.
It is simple and obvious to enumerate the branches in almost all
real-world cases of switch statements.ÿ (And /please/ don't faff
around with cherry-picked examples you have found somewhere as if they
were representative of anything.)
Oh, right. I'm not allowed to use counter-examples to lend weight to my comments. In that case, perhaps you shouldn't be allowed to use your sensible examples either. After all we don't know what someone will feed
to a compiler.
But, suppose C was upgraded so that switch could return a value. For
that, you'd need the value at the end of each branch. OK, here's a
simple one:
ÿÿÿ y = switch (x) {
ÿÿÿÿÿÿÿ case 12:
ÿÿÿÿÿÿÿÿÿÿÿ if (c) case 14: break;
ÿÿÿÿÿÿÿÿÿÿÿ 100;
ÿÿÿÿÿÿÿ case 13:
ÿÿÿÿÿÿÿÿÿÿÿ 200;
ÿÿÿÿÿÿÿÿÿÿÿ break;
ÿÿÿÿÿÿÿ }
Any ideas? I will guess that x=12/c=false or c=13 will yield 200. What
avout x=12/c=true, or x=14, or x = anything else?
So if I understand correctly, you are saying that chains of if/else,
an imaginary version of "switch", and the C tertiary operator all
evaluate the same things in the same way, while with C's switch you
have no idea what happens?
Yes. With C's switch, you can't /in-general/ isolate things into
distinct blocks. You might have a stab if you stick to a subset of C and follow various guidelines, in an effort to make 'switch' look normal.
See the example above.
ÿ That is true, if you cherry-pick what you choose to ignore in each
case until it fits your pre-conceived ideas.
You're the one who's cherry-picking examples of C!
Here is my attempt at
converting the above switch into my syntax (using a tool derived from my
C compiler):
ÿÿÿ switch x
ÿÿÿ when 12 then
ÿÿÿÿÿÿÿ if c then
ÿÿÿÿÿÿÿ fi
ÿÿÿÿÿÿÿ 100
ÿÿÿÿÿÿÿ fallthrough
ÿÿÿ when 13 then
ÿÿÿÿÿÿÿ 200
ÿÿÿ end switch
It doesn't attempt to deal with fallthrough, and misses out that
14-case, and that conditional break. It's not easy; I might have better
luck with assembly!
No, what you call "natural" is entirely subjective.ÿ You have looked
at a microscopic fraction of code written in a tiny proportion of
programming languages within a very narrow set of programming fields.
I've worked with systems programming and have done A LOT in the 15 years until the mid 90s. That included pretty much everything involved in
writing graphical applications given only a text-based disk OS that
provided file-handling.
Plus of course devising and implementing everthing needed to run my own systems language. (After mid 90s, Windows took over half the work.)
That's not criticism - few people have looked at anything more.
Very few people use their own languages, especially over such a long
period, also use them to write commercial applications, or create
languages for others to use.
ÿ What I /do/ criticise is that your assumption that this almost
negligible experience gives you the right to decide what is "natural"
or "true", or how programming languages or tools "should" work.
So, in your opinion, 'switch' should work how it works in C? That is the most intuitive and natural way implementing it?
ÿ You need to learn that other people have different ideas, needs,
opinions or preferences.
Most people haven't got a clue about devising PLs.
ÿ I'd question the whole idea of having a construct that can
evaluate to something of different types in the first place, whether
or not it returns a value, but that's your choice.
If the result of a multi-way execution doesn't yield a value to be
used, then the types don't matter.
Of course they do.
Of course they don't! Here, F, G and H return int, float and void* respectively:
ÿÿÿÿÿÿÿ if (c1) F();
ÿÿ else if (c2) G();
ÿÿ elseÿÿÿÿÿÿÿÿ H();
C will not complain that those branches yield different types. But you
say it should do? Why?
You're just being contradictory for the sake of it aren't you?!
This is just common sense; I don't know why you're questioning it.
(I'd quite like to see a language of your design!)
def foo(n) :
ÿÿÿÿÿif n == 1 : return 10
ÿÿÿÿÿif n == 2 : return 20
ÿÿÿÿÿif n == 3 : return
That's Python, quite happily having a multiple choice selection that
sometimes does not return a value.
Python /always/ returns some value. If one isn't provided, it returns
None. Which means checking that a function returns an explicit value
goes out the window. Delete the 10 and 20 (or the entire body), and it
still 'works'.
ÿ Yes, that is a dynamically typed language, not a statically type
language.
std::optional<int> foo(int n) {
ÿÿÿÿ if (n == 1) return 10;
ÿÿÿÿ if (n == 2) return 20;
ÿÿÿÿ if (n == 3) return {};
}
That's C++, a statically typed language, with a multiple choice
selection that sometimes does not return a value - the return type
supports values of type "int" and non-values.
So what happens when n is 4? Does it return garbage (so that's bad).
Does it arrange to return some special value of 'optional' that means no value?
In that case, the type still does matter, but the language is
providing that default path for you.
X Y A B are arbitrary expressions. The need for 'else' is determined
during type analysis. Whether it will ever execute the default path
would be up to extra analysis, that I don't do, and would anyway be
done later.
But if it is not possible for neither of X or Y to be true, then how
would you test the "else" clause?ÿ Surely you are not proposing that
programmers be required to write lines of code that will never be
executed and cannot be tested?
Why not? They still have to write 'end', or do you propose that can be
left out if control never reaches the end of the function?!
(In earlier versions of my dynamic language, the compiler would insert
an 'else' branch if one was needed, returning 'void'.
I decided that requiring an explicit 'else' branch was better and more failsafe.)
You can't design a language like this where valid syntax depends on
compiler and what it might or might not discover when analysing the
code.
Why not?ÿ It is entirely reasonable to say that a compiler for a
language has to be able to do certain types of analysis.
This was the first part of your example:
ÿconst char * flag_to_text_A(bool b) {
ÿÿÿ if (b == true) {
ÿÿÿÿÿÿÿ return "It's true!";
ÿÿÿ } else if (b == false) {
ÿÿÿÿÿÿÿ return "It's false!";
/I/ would question why you'd want to make the second branch conditional
in the first place. Write an 'else' there, and the issue doesn't arise.
Because I can't see the point of deliberately writing code that usually takes two paths, when either:
ÿ(1) you know that one will never be taken, or
ÿ(2) you're not sure, but don't make any provision in case it is
Fix that first rather relying on compiler writers to take care of your
badly written code.
And also, you keep belittling my abilities and my language, when C allows:
ÿ int F(void) {}
How about getting your house in order first.
Anyone who is convinced that their own personal preferences are more
"natural" or inherently superior to all other alternatives, and can't
justify their claims other than saying that everything else is "a
mess", is just navel-gazing.
I wrote more here but the post is already too long.
Let's just that
'messy' is a fair assessment of C's conditional features, since you can write this:
On 03/11/2024 21:00, Bart wrote:
To my mind, this is a type of "multi-way selection" :
ÿÿÿÿ(const int []){ a, b, c }[n];
I can't see any good reason to exclude it as fitting the descriptive
phrase.
And if "a", "b" and "c" are not constant, but require
evaluation of some sort, it does not change things.ÿ Of course if these required significant effort to evaluate,
or had side-effects, then you
would most likely want a "multi-way selection" construction that did the selection first, then the evaluation - but that's a matter of programmer choice, and does not change the terms.
I am very keen on keeping the concepts distinct in cases where it
matters.
ÿÿÿ int x = if (randomfloat()<0.5) 42;
In C, no.ÿ But when we have spread to other languages, including hypothetical languages, there's nothing to stop that.ÿ Not only could it
be supported by the run-time type system, but it would be possible to
have compile-time types that are more flexible
and only need to be
"solidified" during code generation.ÿ That might allow the language to
track things like "uninitialised" or "no value" during compilation
without having them part of a real type (such as std::optional<> or a C
It doesn't return a value.ÿ That is why it is UB to try to use that non-existent value.
My language will not allow it. Most people would say that that's a
good thing. You seem to want to take the perverse view that such code
should be allowed to return garbage values or have undefined behaviour.
Is your idea of "most people" based on a survey of more than one person?
Note that I have not suggested returning garbage values - I have
suggested that a language might support handling "no value" in a
convenient and safe manner.
Totally independent of and orthogonal to that, I strongly believe that
there is no point in trying to define behaviour for something that
cannot happen,
With justification. 0010 means 8 in C? Jesus.
I think the word "neighbour" is counter-intuitive to spell.
Once a thread here has wandered this far off-topic, it is perhaps not unreasonable to draw comparisons with your one-man language.
The real problem with your language is that you think it is perfect
ÿÿÿ int F() {
ÿÿÿÿÿÿÿ F(1, 2.3, "four", F,F,F,F(),F(F()));
ÿÿÿÿÿÿÿ F(42);
It is undefined behaviour in C.ÿ Programmers are expected to write
sensible code.
If I were the designer of the C language and the maintainer of the C standards, you might have a point.ÿ C is not /my/ language.
We can agree that C /lets/ people write messy code.ÿ It does not
/require/ it.ÿ And I have never found a programming language that stops people writing messy code.
On 04/11/2024 16:35, David Brown wrote:
On 03/11/2024 21:00, Bart wrote:
To my mind, this is a type of "multi-way selection" :
ÿÿÿÿÿ(const int []){ a, b, c }[n];
I can't see any good reason to exclude it as fitting the descriptive
phrase.
And if "a", "b" and "c" are not constant, but require evaluation of
some sort, it does not change things.ÿ Of course if these required
significant effort to evaluate,
Or you had a hundred of them.
or had side-effects, then you would most likely want a "multi-way
selection" construction that did the selection first, then the
evaluation - but that's a matter of programmer choice, and does not
change the terms.
You still don't get how different the concepts are.
On 04/11/2024 16:35, David Brown wrote:
On 03/11/2024 21:00, Bart wrote:
Here is a summary of C vs my language.
I am very keen on keeping the concepts distinct in cases where it
matters.
I know, you like to mix things up. I like clear lines:
ÿ func F:int ...ÿÿÿÿÿÿÿÿÿÿÿÿÿ Always returns a value
ÿ proc Pÿ ...ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ Never returns a value
and only need to be "solidified" during code generation.ÿ That might
allow the language to track things like "uninitialised" or "no value"
during compilation without having them part of a real type (such as
std::optional<> or a C
But you are always returning an actual type in agreement with the
language. That is my point. You're not choosing to just fall off that
cliff and return garbage or just crash.
However, your example with std::optional did just that, despite having
that type available.
It doesn't return a value.ÿ That is why it is UB to try to use that
non-existent value.
And why it is so easy to avoid that UB.
Note that I have not suggested returning garbage values - I have
suggested that a language might support handling "no value" in a
convenient and safe manner.
But in C it is garbage.
Totally independent of and orthogonal to that, I strongly believe that
there is no point in trying to define behaviour for something that
cannot happen,
But it could for n==4.
EVERYBODY agrees that leading zero octals in C were a terrible idea. You can't say it's just me thinks that!
ÿÿÿ int F() {
ÿÿÿÿÿÿÿ F(1, 2.3, "four", F,F,F,F(),F(F()));
ÿÿÿÿÿÿÿ F(42);
It is undefined behaviour in C.ÿ Programmers are expected to write
sensible code.
But it would be nice if the language stopped people writing such things, yes?
Can you tell me which other current languages, other than C++ and
assembly, allow such nonsense?
None? So it's not just me and my language then! Mine is lower level and still plenty unsafe, but it has somewhat higher standards.
If I were the designer of the C language and the maintainer of the C
standards, you might have a point.ÿ C is not /my/ language.
You do like to defend it though.
We can agree that C /lets/ people write messy code.ÿ It does not
/require/ it.ÿ And I have never found a programming language that
stops people writing messy code.
I had included half a dozen points that made C's 'if' error prone and confusing, that would not occur in my syntax because it is better designed.
You seem to be incapable of drawing a line beween what a language can enforce, and what a programmer is free to express.
Or rather, because a programmer has so much freedom anyway, let's not
bother with any lines at all! Just have a language that simply doesn't
care.
On 04/11/2024 20:50, Bart wrote:
But it could for n==4.
Again, you /completely/ miss the point.
If you have a function (or construct) that returns a correct value for inputs 1, 2 and 3, and you never pass it the value 4 (or anything else), then there is no undefined behaviour no matter what the code looks like
for values other than 1, 2 and 3.ÿ If someone calls that function with
input 4, then /their/ code has the error - not the code that doesn't
handle an input 4.
On 04/11/2024 20:50, Bart wrote:
On 04/11/2024 16:35, David Brown wrote:
On 03/11/2024 21:00, Bart wrote:
To my mind, this is a type of "multi-way selection" :
ÿÿÿÿÿ(const int []){ a, b, c }[n];
I can't see any good reason to exclude it as fitting the descriptive
phrase.
And if "a", "b" and "c" are not constant, but require evaluation of
some sort, it does not change things.ÿ Of course if these required
significant effort to evaluate,
Or you had a hundred of them.
or had side-effects, then you would most likely want a "multi-way
selection" construction that did the selection first, then the
evaluation - but that's a matter of programmer choice, and does not
change the terms.
You still don't get how different the concepts are.
Yes, I do.ÿ I also understand how they are sometimes exactly the same
thing, depending on the language, and how they can often have the same
end result, depending on the details, and how they can often be
different, especially in the face of side-effects or efficiency concerns.
Look, it's really /very/ simple.
A) You can have a construct that says "choose one of these N things to execute and evaluate, and return that value (if any)".
B) You can have a construct that says "here are N things, select one of
them to return as a value".
Both of these can reasonably be called "multi-way selection" constructs.
ÿSome languages can have one as a common construct, other languages may have the other, and many support both in some way.ÿ Pretty much any
language that allows the programmer to have control over execution order will let you do both in some way, even if there is not a clear language construct for it and you have to write it manually in code.
Mostly type A will be most efficient if there is a lot of effort
involved in putting together the things to select.ÿ Type B is likely to
be most efficient if you already have the collection of things to choose from (it can be as simple as an array lookup), if the creation of the collection can be done in parallel (such as in some SIMD uses), or if
the cpu can generate them all before it has established the selection
index.
Sometimes type A will be the simplest and clearest in the code,
sometimes type B will be the simplest and clearest in the code.
Both of these constructs are "multi-way selections".
Your mistake is in thinking that type A is all there is and all that matters, possibly because you feel you have a better implementation for
it than C has.ÿ (I think that you /do/ have a nicer switch than C, but
that does not justify limiting your thinking to it.)
On 04/11/2024 22:25, David Brown wrote:
On 04/11/2024 20:50, Bart wrote:
But it could for n==4.
Again, you /completely/ miss the point.
If you have a function (or construct) that returns a correct value for
inputs 1, 2 and 3, and you never pass it the value 4 (or anything
else), then there is no undefined behaviour no matter what the code
looks like for values other than 1, 2 and 3.ÿ If someone calls that
function with input 4, then /their/ code has the error - not the code
that doesn't handle an input 4.
This is the wrong kind of thinking.
If this was a library function then, sure, you can stipulate a set of
input values, but that's at a different level, where you are writing
code on top of a working, well-specified language.
You don't make use of holes in the language, one that can cause a crash. That is, by allowing a function to run into an internal RET op with no provision for a result. That's if there even is a RET; perhaps your compilers are so confident that that path is not taken, or you hint it
won't be, that they won't bother!
It will start executing whatever random bytes follow the function.
As I said in my last post, a missing return value caused an internal
error in one of my C implementations because a pushed return value was missing.
How should that be fixed, via a hack in the implementation which pushes
some random value to avoid an immediate crash? And then what?
Let the user - the author of the function - explicitly provide that
value then at least that can be documented: if N isn't in 1..3, then F returns so and so.
You know that makes perfect sense, but because you've got used to that dangerous feature in C you think it's acceptable.
Then we disagree on what 'multi-way' select might mean. I think it means branching, even if notionally, on one-of-N possible code paths.
The whole construct may or may not return a value. If it does, then one
of the N paths must be a default path.
Bart <bc@freeuk.com> wrote:
Then we disagree on what 'multi-way' select might mean. I think it means
branching, even if notionally, on one-of-N possible code paths.
OK.
The whole construct may or may not return a value. If it does, then one
of the N paths must be a default path.
You need to cover all input values. This is possible when there
is reasonably small number of possibilities. For example, switch on
char variable which covers all possible values does not need default
path. Default is needed only when number of possibilities is too
large to explicitely give all of them. And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.
Bart <bc@freeuk.com> wrote:
Then we disagree on what 'multi-way' select might mean. I think it means
branching, even if notionally, on one-of-N possible code paths.
OK.
The whole construct may or may not return a value. If it does, then one
of the N paths must be a default path.
You need to cover all input values. This is possible when there
is reasonably small number of possibilities. For example, switch on
char variable which covers all possible values does not need default
path. Default is needed only when number of possibilities is too
large to explicitely give all of them. And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.
On 02.11.2024 19:09, Tim Rentsch wrote:
[...] As long as
the code is logically correct you are free to choose either
style, and it's perfectly okay to choose the one that you find
more appealing.
This is certainly true for one-man-shows.
Hardly suited for most professional contexts I worked in.
On 04/11/2024 04:00, Tim Rentsch wrote:
fir <fir@grunge.pl> writes:
Tim Rentsch wrote:
With the understanding that I am offering [nothing] more than my
own opinion, I can say that I might use any of the patterns
mentioned, depending on circumstances. I don't think any one
approach is either always right or always wrong.
maybe, but some may heve some strong arguments (for use this and
not that) i may overlook
I acknowledge the point, but you haven't gotten any arguments,
only opinions.
Pretty much everything about PL design is somebody's opinion.
Bart <bc@freeuk.com> wrote:
Then we disagree on what 'multi-way' select might mean. I think it means
branching, even if notionally, on one-of-N possible code paths.
OK.
The whole construct may or may not return a value. If it does, then one
of the N paths must be a default path.
You need to cover all input values. This is possible when there
is reasonably small number of possibilities. For example, switch on
char variable which covers all possible values does not need default
path. Default is needed only when number of possibilities is too
large to explicitely give all of them. And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.
On 05/11/2024 12:42, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
Then we disagree on what 'multi-way' select might mean. I think it means >>> branching, even if notionally, on one-of-N possible code paths.
OK.
The whole construct may or may not return a value. If it does, then one
of the N paths must be a default path.
You need to cover all input values.ÿ This is possible when there
is reasonably small number of possibilities.ÿ For example, switch on
char variable which covers all possible values does not need default
path.ÿ Default is needed only when number of possibilities is too
large to explicitely give all of them.ÿ And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.
What's easier to implement in a language: to have a conditional need for
an 'else' branch, which is dependent on the compiler performing some arbitrarily complex levels of analysis on some arbitrarily complex set
of expressions...
...or to just always require 'else', with a dummy value if necessary?
On 05/11/2024 13:42, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
Then we disagree on what 'multi-way' select might mean. I think it means >>> branching, even if notionally, on one-of-N possible code paths.
OK.
I appreciate this is what Bart means by that phrase, but I don't agree
with it. I'm not sure if that is covered by "OK" or not!
The whole construct may or may not return a value. If it does, then one
of the N paths must be a default path.
You need to cover all input values. This is possible when there
is reasonably small number of possibilities. For example, switch on
char variable which covers all possible values does not need default
path. Default is needed only when number of possibilities is too
large to explicitely give all of them. And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.
I think this is all very dependent on what you mean by "all input values".
Supposing I declare this function:
// Return the integer square root of numbers between 0 and 10
int small_int_sqrt(int x);
To me, the range of "all input values" is integers from 0 to 10. I
could implement it as :
int small_int_sqrt(int x) {
if (x == 0) return 0;
if (x < 4) return 1;
if (x < 9) return 2;
if (x < 16) return 3;
unreachable();
}
If the user asks for small_int_sqrt(-10) or small_int_sqrt(20), that's /their/ fault and /their/ problem. I said nothing about what would
happen in those cases.
But some people seem to feel that "all input values" means every
possible value of the input types, and thus that a function like this
should return a value even when there is no correct value in and no
correct value out.
This is, IMHO, just nonsense and misunderstands the contract between function writers and function users.
Further, I am confident that these people are quite happen to write code like :
// Take a pointer to an array of two ints, add them, and return the sum
int sum_two_ints(const int * p) {
return p[0] + p[1];
}
Perhaps, in a mistaken belief that it makes the code "safe", they will add :
if (!p) return 0;
at the start of the function. But they will not check that "p" actually points to an array of two ints (how could they?), nor will they check
for integer overflow (and what would they do if it happened?).
A function should accept all input values - once you have made clear
what the acceptable input values can be. A "default" case is just a short-cut for conveniently handling a wide range of valid input values -
it is never a tool for handling /invalid/ input values.
On 05/11/2024 12:42, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
Then we disagree on what 'multi-way' select might mean. I think it means >>> branching, even if notionally, on one-of-N possible code paths.
OK.
The whole construct may or may not return a value. If it does, then one
of the N paths must be a default path.
You need to cover all input values. This is possible when there
is reasonably small number of possibilities. For example, switch on
char variable which covers all possible values does not need default
path. Default is needed only when number of possibilities is too
large to explicitely give all of them. And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.
What's easier to implement in a language: to have a conditional need for
an 'else' branch, which is dependent on the compiler performing some arbitrarily complex levels of analysis on some arbitrarily complex set
of expressions...
...or to just always require 'else', with a dummy value if necessary?
Even if you went with the first, what happens if the compiler can't guarantee that all values of a selector are covered; should it report
that, or say nothing?
What happens if you do need 'else', but later change things so all bases
are covered; will the compiler report it as being unnecesary, so that
you remove it?
Now, C doesn't have such a feature to test out (ie. that is a construct
with an optional 'else' branch, the whole of which returns a value). The nearest is function return values:
int F(int n) {
if (n==1) return 10;
if (n==2) return 20;
}
Here, neither tcc not gcc report that you might run into the end of the function. It will return garbage if called with anything other than 1 or 2.
gcc will say something with enough warning levels (reaches end of
non-void function). But it will say the same here:
int F(unsigned char c) {
if (c<128) return 10;
if (c>=128) return 20;
}
David Brown <david.brown@hesbynett.no> wrote:
On 05/11/2024 13:42, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
Then we disagree on what 'multi-way' select might mean. I think it means >>>> branching, even if notionally, on one-of-N possible code paths.
OK.
I appreciate this is what Bart means by that phrase, but I don't agree
with it. I'm not sure if that is covered by "OK" or not!
You may prefer your own definition, but Bart's is resonable one.
The whole construct may or may not return a value. If it does, then one >>>> of the N paths must be a default path.
You need to cover all input values. This is possible when there
is reasonably small number of possibilities. For example, switch on
char variable which covers all possible values does not need default
path. Default is needed only when number of possibilities is too
large to explicitely give all of them. And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.
I think this is all very dependent on what you mean by "all input values". >>
Supposing I declare this function:
// Return the integer square root of numbers between 0 and 10
int small_int_sqrt(int x);
To me, the range of "all input values" is integers from 0 to 10. I
could implement it as :
int small_int_sqrt(int x) {
if (x == 0) return 0;
if (x < 4) return 1;
if (x < 9) return 2;
if (x < 16) return 3;
unreachable();
}
If the user asks for small_int_sqrt(-10) or small_int_sqrt(20), that's
/their/ fault and /their/ problem. I said nothing about what would
happen in those cases.
But some people seem to feel that "all input values" means every
possible value of the input types, and thus that a function like this
should return a value even when there is no correct value in and no
correct value out.
Well, some languages treat types more seriously than C. In Pascal
type of your input would be 0..10 and all input values would be
handled. Sure, when domain is too complicated to express in type
than it could be documented restriction. Still, it makes sense to
signal error if value goes outside handled rage, so in a sense all
values of input type are handled: either you get valid answer or
clear error.
This is, IMHO, just nonsense and misunderstands the contract between
function writers and function users.
Further, I am confident that these people are quite happen to write code
like :
// Take a pointer to an array of two ints, add them, and return the sum
int sum_two_ints(const int * p) {
return p[0] + p[1];
}
I do not think that people wanting strong type checking are happy
with C. Simply, either they use different language or use C
without bitching, but aware of its limitations.
I certainly would
be quite unhappy with code above. It is possible that I would still
use it as a compromise (say if it was desirable to have single
prototype but handle points in spaces of various dimensions),
but my first attempt would be something like:
typedef struct {int p[2];} two_int;
....
Perhaps, in a mistaken belief that it makes the code "safe", they will add : >>
if (!p) return 0;
at the start of the function. But they will not check that "p" actually
points to an array of two ints (how could they?), nor will they check
for integer overflow (and what would they do if it happened?).
I am certainly unhappy with overflow handling in current hardware
an by extention with overflow handling in C.
A function should accept all input values - once you have made clear
what the acceptable input values can be. A "default" case is just a
short-cut for conveniently handling a wide range of valid input values -
it is never a tool for handling /invalid/ input values.
Well, default can signal error which frequently is right handling
of invalid input values.
On 05/11/2024 13:42, Waldek Hebisch wrote:
Supposing I declare this function:
// Return the integer square root of numbers between 0 and 10
int small_int_sqrt(int x);
To me, the range of "all input values" is integers from 0 to 10.ÿ I
could implement it as :
int small_int_sqrt(int x) {
ÿÿÿÿif (x == 0) return 0;
ÿÿÿÿif (x < 4) return 1;
ÿÿÿÿif (x < 9) return 2;
ÿÿÿÿif (x < 16) return 3;
ÿÿÿÿunreachable();
}
If the user asks for small_int_sqrt(-10) or small_int_sqrt(20), that's /their/ fault and /their/ problem.ÿ I said nothing about what would
happen in those cases.
But some people seem to feel that "all input values" means every
possible value of the input types, and thus that a function like this
should return a value even when there is no correct value in and no
correct value out.
// Take a pointer to an array of two ints, add them, and return the sum
int sum_two_ints(const int * p) {
ÿÿÿÿreturn p[0] + p[1];
}
Perhaps, in a mistaken belief that it makes the code "safe", they will
add :
ÿÿÿÿif (!p) return 0;
at the start of the function.ÿ But they will not check that "p" actually points to an array of two ints (how could they?), nor will they check
for integer overflow (and what would they do if it happened?).
On 05/11/2024 20:39, Waldek Hebisch wrote:
David Brown <david.brown@hesbynett.no> wrote:
On 05/11/2024 13:42, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
Then we disagree on what 'multi-way' select might mean. I think it
means
branching, even if notionally, on one-of-N possible code paths.
OK.
I appreciate this is what Bart means by that phrase, but I don't agree
with it.ÿ I'm not sure if that is covered by "OK" or not!
You may prefer your own definition, but Bart's is resonable one.
The only argument I can make here is that I have not seen "multi-way
select" as a defined phrase with a particular established meaning.
Bart <bc@freeuk.com> wrote:
On 05/11/2024 12:42, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
Then we disagree on what 'multi-way' select might mean. I think it means >>>> branching, even if notionally, on one-of-N possible code paths.
OK.
The whole construct may or may not return a value. If it does, then one >>>> of the N paths must be a default path.
You need to cover all input values. This is possible when there
is reasonably small number of possibilities. For example, switch on
char variable which covers all possible values does not need default
path. Default is needed only when number of possibilities is too
large to explicitely give all of them. And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.
What's easier to implement in a language: to have a conditional need for
an 'else' branch, which is dependent on the compiler performing some
arbitrarily complex levels of analysis on some arbitrarily complex set
of expressions...
...or to just always require 'else', with a dummy value if necessary?
Well, frequently it is easier to do bad job, than a good one.
normally you do not need very complex analysis:
On 05/11/2024 20:33, David Brown wrote:
On 05/11/2024 20:39, Waldek Hebisch wrote:
David Brown <david.brown@hesbynett.no> wrote:
On 05/11/2024 13:42, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
Then we disagree on what 'multi-way' select might mean. I think it >>>>>> means
branching, even if notionally, on one-of-N possible code paths.
OK.
I appreciate this is what Bart means by that phrase, but I don't agree >>>> with it.ÿ I'm not sure if that is covered by "OK" or not!
You may prefer your own definition, but Bart's is resonable one.
The only argument I can make here is that I have not seen "multi-way
select" as a defined phrase with a particular established meaning.
Well, it started off as 2-way select, meaning constructs like this:
x = c ? a : b;
x := (c | a | b)
Where one of two branches is evaluated. I extended the latter to N-way select:
x := (n | a, b, c, ... | z)
(defmacro nsel (expr . clauses)^(caseql ,expr ,*[mapcar list 1 clauses]))
(nsel 1 (prinl "one") (prinl "two") (prinl "three"))"one"
(nsel (+ 1 1) (prinl "one") (prinl "two") (prinl "three"))"two"
(nsel (+ 1 3) (prinl "one") (prinl "two") (prinl "three"))nil
(nsel (+ 1 2) (prinl "one") (prinl "two") (prinl "three"))"three"
(macroexpand-1 '(nsel x a b c d))(caseql x (1 a)
On 05/11/2024 20:33, David Brown wrote:
On 05/11/2024 20:39, Waldek Hebisch wrote:
David Brown <david.brown@hesbynett.no> wrote:
On 05/11/2024 13:42, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
Then we disagree on what 'multi-way' select might mean. I think it >>>>>> means
branching, even if notionally, on one-of-N possible code paths.
OK.
I appreciate this is what Bart means by that phrase, but I don't agree >>>> with it.ÿ I'm not sure if that is covered by "OK" or not!
You may prefer your own definition, but Bart's is resonable one.
The only argument I can make here is that I have not seen "multi-way
select" as a defined phrase with a particular established meaning.
Well, it started off as 2-way select, meaning constructs like this:
ÿÿ x = c ? a : b;
ÿÿ x := (c | a | b)
Where one of two branches is evaluated. I extended the latter to N-way select:
ÿÿ x := (n | a, b, c, ... | z)
On 2024-11-05, Bart <bc@freeuk.com> wrote:
Well, it started off as 2-way select, meaning constructs like this:
x = c ? a : b;
x := (c | a | b)
Where one of two branches is evaluated. I extended the latter to N-way
select:
x := (n | a, b, c, ... | z)
This looks quite error-prone. You have to count carefully that
the cases match the intended values. If an entry is
inserted, all the remaining ones shift to a higher value.
You've basically taken a case construct and auto-generated
the labels starting from 1.
On 04/11/2024 20:50, Bart wrote:
On 04/11/2024 16:35, David Brown wrote:
On 03/11/2024 21:00, Bart wrote:
Here is a summary of C vs my language.
<snip the irrelevant stuff>
I am very keen on keeping the concepts distinct in cases where it
matters.
I know, you like to mix things up. I like clear lines:
ÿÿ func F:int ...ÿÿÿÿÿÿÿÿÿÿÿÿÿ Always returns a value
ÿÿ proc Pÿ ...ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ Never returns a value
Oh, you /know/ that, do you?ÿ And how do you "know" that?ÿ Is that
because you still think I am personally responsible for the C language,
and that I think C is the be-all and end-all of perfect languages?
I agree that it can make sense to divide different types of "function".
I disagree that whether or not a value is returned has any significant relevance.ÿ I see no difference, other than minor syntactic issues,
between "int foo(...)" and "void foo(int * result, ...)".
If you have a function (or construct) that returns a correct value for inputs 1, 2 and 3, and you never pass it the value 4 (or anything else), then there is no undefined behaviour no matter what the code looks like
for values other than 1, 2 and 3.ÿ If someone calls that function with
input 4, then /their/ code has the error - not the code that doesn't
handle an input 4.
I agree that this a terrible idea. <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60523>
But picking one terrible idea in C does not mean /everything/ in C is a terrible idea!ÿ /That/ is what you got wrong, as you do so often.
Can you tell me which other current languages, other than C++ and
assembly, allow such nonsense?
Python.
Of course, it is equally meaningless in Python as it is in C.
I defend it if that is appropriate.ÿ Mostly, I /explain/ it to you.ÿ It
is bizarre that people need to do that for someone who claims to have written a C compiler, but there it is.
I'm glad you didn't - it would be a waste of effort.
You /do/ understand that I use top-quality tools with carefully chosen warnings, set to throw fatal errors, precisely because I want a language that has a lot more "lines" and restrictions that your little tools?
/Every/ C programmer uses a restricted subset of C - some more
restricted than others.ÿ I choose to use a very strict subset of C for
my work, because it is the best language for the tasks I need to do.ÿ (I also use a very strict subset of C++ when it is a better choice.)
On 05/11/2024 13:29, David Brown wrote:
On 05/11/2024 13:42, Waldek Hebisch wrote:
Supposing I declare this function:
// Return the integer square root of numbers between 0 and 10
int small_int_sqrt(int x);
To me, the range of "all input values" is integers from 0 to 10.ÿ I
could implement it as :
int small_int_sqrt(int x) {
ÿÿÿÿÿif (x == 0) return 0;
ÿÿÿÿÿif (x < 4) return 1;
ÿÿÿÿÿif (x < 9) return 2;
ÿÿÿÿÿif (x < 16) return 3;
ÿÿÿÿÿunreachable();
}
If the user asks for small_int_sqrt(-10) or small_int_sqrt(20), that's
/their/ fault and /their/ problem.ÿ I said nothing about what would
happen in those cases.
But some people seem to feel that "all input values" means every
possible value of the input types, and thus that a function like this
should return a value even when there is no correct value in and no
correct value out.
Your example is an improvement on your previous ones. At least it
attempts to deal with out-of-range conditions!
However there is still the question of providing that return type. If 'unreachable' is not a special language feature, then this can fail
either if the language requires the 'return' keyword, or 'unreachable' doesn't yield a compatible type (even if it never returns because it's
an error handler).
Getting that right will satisfy both the language (if it cared more
about such matters than C apparently does), and the casual reader
curious about how the function contract is met (that is, supplying that promised return int type if or when it returns).
// Take a pointer to an array of two ints, add them, and return the sum
int sum_two_ints(const int * p) {
ÿÿÿÿÿreturn p[0] + p[1];
}
Perhaps, in a mistaken belief that it makes the code "safe", they will
add :
ÿÿÿÿÿif (!p) return 0;
at the start of the function.ÿ But they will not check that "p"
actually points to an array of two ints (how could they?), nor will
they check for integer overflow (and what would they do if it happened?).
This is a different category of error.
Here's a related example of what I'd class as a language error:
ÿÿ int a;
ÿÿ a = (exit(0), &a);
A type mismatch error is usually reported. However, the assignment is
never done because it never returns from that exit() call.
I expect you wouldn't think much of a compiler that didn't report such
an error because that code is never executed.
But to me that is little different from running into the end of function without the proper provisions for a valid return value.
On 04/11/2024 22:25, David Brown wrote:
On 04/11/2024 20:50, Bart wrote:
On 04/11/2024 16:35, David Brown wrote:
On 03/11/2024 21:00, Bart wrote:
Here is a summary of C vs my language.
<snip the irrelevant stuff>
I am very keen on keeping the concepts distinct in cases where it
matters.
I know, you like to mix things up. I like clear lines:
ÿÿ func F:int ...ÿÿÿÿÿÿÿÿÿÿÿÿÿ Always returns a value
ÿÿ proc Pÿ ...ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ Never returns a value
Oh, you /know/ that, do you?ÿ And how do you "know" that?ÿ Is that
because you still think I am personally responsible for the C
language, and that I think C is the be-all and end-all of perfect
languages?
I agree that it can make sense to divide different types of
"function". I disagree that whether or not a value is returned has any
significant relevance.ÿ I see no difference, other than minor
syntactic issues, between "int foo(...)" and "void foo(int * result,
...)".
I don't use functional concepts; my functions may or may not be pure.
But the difference between value-returning and non-value returning
functions to me is significant:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ Funcÿ Proc
return x;ÿÿÿÿÿÿÿÿ Yÿÿÿÿ N
return;ÿÿÿÿÿÿÿÿÿÿ Nÿÿÿÿ Y
hit final }ÿÿÿÿÿÿ Nÿÿÿÿ Y
Pureÿÿÿÿÿÿÿÿÿÿÿÿÿ ?ÿÿÿÿ Unlikely
Side-effectsÿÿÿÿÿ ?ÿÿÿÿ Likely
Call within exprÿ Yÿÿÿÿ N
Call standaloneÿÿ ?ÿÿÿÿ Y
Having a clear distinction helps me focus more precisely on how a
routine has to work.
In C, the syntax is dreadful: not only can you barely distinguish a
function from a procedure (even without attributes, user types and
macros add in), but you can hardly tell them apart from variable declarations.
In fact, function declarations can even be declared in the middle of a
set of variable declarations.
You can learn a lot about the underlying structure of of a language by implementing it. So when I generate IL from C for example, I found the
need to have separate instructions to call functions and procedures, and separate return instructions too.
If you have a function (or construct) that returns a correct value for
inputs 1, 2 and 3, and you never pass it the value 4 (or anything
else), then there is no undefined behaviour no matter what the code
looks like for values other than 1, 2 and 3.ÿ If someone calls that
function with input 4, then /their/ code has the error - not the code
that doesn't handle an input 4.
No. The function they are calling is badly formed. There should never be
any circumstance where a value-returning function terminates (hopefully
by running into RET) without an explicit set return value.
I agree that this a terrible idea.
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60523>
But picking one terrible idea in C does not mean /everything/ in C is
a terrible idea!ÿ /That/ is what you got wrong, as you do so often.
What the language does is generally fine. /How/ it does is generally terrible. (Type syntax; no 'fun' keyword; = vs ==; operator precedence; format codes; 'break' in switch; export by default; struct T vs typedef
T; dangling 'else'; optional braces; ... there's reams of this stuff!)
So actually, I'm not wrong. There have been discussions about all of
these and a lot more.
Can you tell me which other current languages, other than C++ and
assembly, allow such nonsense?
Python.
Of course, it is equally meaningless in Python as it is in C.
Python at least can trap the errors. Once you fix the unlimited
recursion, it will detect the wrong number of arguments. In C, before
C23 anyway, any number and types of arguments is legal in that example.
I defend it if that is appropriate.ÿ Mostly, I /explain/ it to you.
It is bizarre that people need to do that for someone who claims to
have written a C compiler, but there it is.
It is bizarre that the ins and outs of C, a supposedly simple language,
are so hard to understand.
I'm glad you didn't - it would be a waste of effort.
I guessed that. You seemingly don't care that C is a messy language with many quirks; you just work around it by using a subset, with some help
from your compiler in enforcing that subset.
So you're using a strict dialect. The trouble is that everyone else
using C will either be using their own dialect incompatible with yours,
or are stuck using the messy language and laid-back compilers operating
in lax mode by default.
I'm interested in fixing things at source - within a language.
You /do/ understand that I use top-quality tools with carefully chosen
warnings, set to throw fatal errors, precisely because I want a
language that has a lot more "lines" and restrictions that your little
tools? /Every/ C programmer uses a restricted subset of C - some more
restricted than others.ÿ I choose to use a very strict subset of C for
my work, because it is the best language for the tasks I need to do.
(I also use a very strict subset of C++ when it is a better choice.)
I'd guess only 1% of your work with C involves the actual language, and
99% using additional tooling.
With me it's mostly about the language.
On 06/11/2024 15:40, Bart wrote:
There are irrelevant differences in syntax, which could easily disappear entirely if a language supported a default initialisation value when a return gives no explicit value.ÿ (i.e., "T foo() { return; }; T x =
foo();" could be treated in the same way as "T x;" in a static initialisation context.)
Then you list some things that may or may not happen, which are of
course totally irrelevant.ÿ If you list the differences between bikes
and cars, you don't include "some cars are red" and "bikes are unlikely
to be blue".
It's a pointless distinction.ÿ Any function or procedure can be morphed
into the other form without any difference in the semantic meaning of
the code, requiring just a bit of re-arrangement at the caller site:
ÿÿÿÿint foo(int x) { int y = ...; return y; }
ÿÿÿÿvoid foo(int * res, int x) { int y = ...; *res = y; }
ÿÿÿÿvoid foo(int x) { ... ; return; }
ÿÿÿÿint foo(int x) { ... ; return 0; }
There is no relevance in the division here, which is why most languages don't make a distinction unless they do so simply for syntactic reasons.
In C, the syntax is dreadful: not only can you barely distinguish a
function from a procedure (even without attributes, user types and
macros add in), but you can hardly tell them apart from variable
declarations.
As always, you are trying to make your limited ideas of programming languages appear to be correct, universal, obvious or "natural" by
saying things that you think are flaws in C.ÿ That's not how a
discussion works, and it is not a way to convince anyone of anything.
The fact that C does not have a keyword used in the declaration or definition of a function does not in any way mean that there is the slightest point in your artificial split between "func" and "proc" functions.
(It doesn't matter that I too prefer a clear keyword for defining
functions in a language.)
That is solely from your choice of an IL.
Of course you are wrong!
If there was an alternative language that I thought would be better for
the tasks I have, I'd use that.ÿ (Actually, a subset of C++ is often
better, so I use that when I can.)
What do you think I should do instead?ÿ Whine in newsgroups to people
that don't write language standards (for C or anything else) and don't
make compilers?
Make my own personal language that is useless to
everyone else and holds my customers to ransom by being the only person
that can work with their code?
On 05/11/2024 23:48, Bart wrote:
On 05/11/2024 13:29, David Brown wrote:
int small_int_sqrt(int x) {
ÿÿÿÿÿif (x == 0) return 0;
ÿÿÿÿÿif (x < 4) return 1;
ÿÿÿÿÿif (x < 9) return 2;
ÿÿÿÿÿif (x < 16) return 3;
ÿÿÿÿÿunreachable();
}
"unreachable()" is a C23 standardisation of a feature found in most
high-end compilers.ÿ For gcc and clang, there is
__builtin_unreachable(), and MSVC has its version.
Getting that right will satisfy both the language (if it cared more
about such matters than C apparently does), and the casual reader
curious about how the function contract is met (that is, supplying
that promised return int type if or when it returns).
C gets it right here.ÿ There is no need for a return type when there is
no return
indeed, trying to force some sort of type or "default" value
would be counterproductive.ÿ It would be confusing to the reader, > add untestable and unexecutable source code,
Let's now look at another alternative - have the function check for validity, and return some kind of error signal if the input is invalid. There are two ways to do this - we can have a value of the main return....
type acting as an error signal, or we can have an additional return value.
All in all, we have a significant costs in various aspects, with no real benefit, all in the name of a mistaken belief that we are avoiding
undefined behaviour.
On 06/11/2024 14:50, David Brown wrote:
C gets it right here.ÿ There is no need for a return type when there
is no return
There is no return for only half the function! A function with a return
type is a function that CAN return. If it can't ever return, then make
it a procedure.
Take this function where N can never be zero; is this the right way to
write it in C:
ÿÿ int F(int N) {
ÿÿÿÿÿÿ if (N==0) unreachable();
ÿÿÿÿÿÿ return abc/N;ÿÿÿÿÿÿÿÿÿÿÿÿÿ // abc is a global with value 100
ÿÿ }
If doesn't look right. If I compile it with gcc (using __builtin_unreachable), and call F(0), then it crashes. So it doesn't do much does it?!
On 06/11/2024 15:47, David Brown wrote:
On 06/11/2024 15:40, Bart wrote:
There are irrelevant differences in syntax, which could easily
disappear entirely if a language supported a default initialisation
value when a return gives no explicit value.ÿ (i.e., "T foo() {
return; }; T x = foo();" could be treated in the same way as "T x;" in
a static initialisation context.)
You wrote:
ÿ T foo () {return;}ÿÿÿÿÿÿÿ # definition?
ÿ T x = foo();ÿÿÿÿÿÿÿÿÿÿÿÿÿ # call?
I'm not quite sure what you're saying here. That a missing return value
in non-void function would default to all-zeros?
Maybe. A rather pointless feature just to avoid writing '0', and which
now introduces a new opportunity for a silent error (accidentally
forgetting a return value).
It's not quite the same as a static initialisiation, which is zeroed
when a program starts.
Then you list some things that may or may not happen, which are of
course totally irrelevant.ÿ If you list the differences between bikes
and cars, you don't include "some cars are red" and "bikes are
unlikely to be blue".
Yes; if you're using a vehicle, or planning a journey or any related
thing, it helps to remember if it's a bike or a car! At least here you acknowledge the difference.
But I guess you find those likely/unlikely macros of gcc pointless too.
If I know something is a procedure, then I also know it is likely to
change global state, that I might need to deal with a return value, and
a bunch of other stuff.
Boldly separating the two with either FUNC or PROC denotations I find
helps tremendously. YM-obviously-V, but you can't have a go at me for my view.
If I really found it a waste of time, the distinction would have been dropped decades ago.
It's a pointless distinction.ÿ Any function or procedure can be
morphed into the other form without any difference in the semantic
meaning of the code, requiring just a bit of re-arrangement at the
caller site:
ÿÿÿÿÿint foo(int x) { int y = ...; return y; }
ÿÿÿÿÿvoid foo(int * res, int x) { int y = ...; *res = y; }
ÿÿÿÿÿvoid foo(int x) { ... ; return; }
ÿÿÿÿÿint foo(int x) { ... ; return 0; }
There is no relevance in the division here, which is why most
languages don't make a distinction unless they do so simply for
syntactic reasons.
As I said, you like to mix things up. You disagreed. I'm not surprised.
Here you've demonstrated how a function that returns results by value
can be turned into a procedure that returns a result by reference.
So now, by-value and by-reference are the same thing?
I listed seven practical points of difference between functions and procedures, and above is an eighth point, but you just dismissing them.
Is there any point in this?
I do like taking what some think as a single feature and having
dedicated versions, because I find it helpful.
That includes functions, loops, control flow and selections.
In C, the syntax is dreadful: not only can you barely distinguish a
function from a procedure (even without attributes, user types and
macros add in), but you can hardly tell them apart from variable
declarations.
As always, you are trying to make your limited ideas of programming
languages appear to be correct, universal, obvious or "natural" by
saying things that you think are flaws in C.ÿ That's not how a
discussion works, and it is not a way to convince anyone of anything.
The fact that C does not have a keyword used in the declaration or
definition of a function does not in any way mean that there is the
slightest point in your artificial split between "func" and "proc"
functions.
ÿ void F();
ÿ void (*G);
ÿ void *H();
ÿ void (*I)();
OK, 4 things declared here. Are they procedures, functions, variables,
or pointers to functions? (I avoided using a typedef in place of 'void'
to make things easier.)
I /think/ they are as follows: procedure, pointer variable, function (returning void*), and pointer to a procedure. But I had to work at it,
even though the examples are very simple.
I don't know about you, but I prefer syntax like this:
ÿÿ proc F
ÿÿ ref void G
ÿÿ ref proc H
ÿÿ func I -> ref void
Now come on, scream at again for prefering a nice syntax for
programming, one which just tells me at a glance what it means without having to work it out.
(It doesn't matter that I too prefer a clear keyword for defining
functions in a language.)
Why? Don't your smart tools tell you all that anyway?
That is solely from your choice of an IL.
The IL design also falls into place from the natural way these things
have to work.
Of course you are wrong!
You keep saying that. But then you also keep saying, from time to time,
that you agree that something in C was a bad idea. So I'm still wrong
when calling out the same thing?
If there was an alternative language that I thought would be better
for the tasks I have, I'd use that.ÿ (Actually, a subset of C++ is
often better, so I use that when I can.)
What do you think I should do instead?ÿ Whine in newsgroups to people
that don't write language standards (for C or anything else) and don't
make compilers?
What makes you think I'm whining? The thread opened up a discussion
about multi-way selections, and it got into how it could be done with features from other languages.
I gave some examples from mine, as I'm very familiar with that, and it
uses simple features that are easy to grasp and appreciate. You could
have done the same from ones you know.
But you just hate the idea that I have my own language to draw on, whose syntax is very sweet ('serious' languages hate such syntax for some
reason, and is usually relegated to scripting languages.)
I guess then you just have to belittle and insult me, my languages and
my views at every opporunity.
Make my own personal language that is useless to everyone else and
holds my customers to ransom by being the only person that can work
with their code?
Plenty of companies use DSLs. But isn't that sort of what you do? That
is, using 'C' with a particular interpretation or enforcement of the
rules, which needs to go in hand with a particular compiler, version,
sets of options and assorted makefiles.
I for one would never be able to build one of your programs. It might as well be written in your inhouse language with proprietory tools.
On 06/11/2024 20:38, Bart wrote:
ÿÿ void F();
ÿÿ void (*G);
ÿÿ void *H();
ÿÿ void (*I)();
OK, 4 things declared here. Are they procedures, functions, variables,
or pointers to functions? (I avoided using a typedef in place of
'void' to make things easier.)
I /think/ they are as follows: procedure, pointer variable, function
(returning void*), and pointer to a procedure. But I had to work at
it, even though the examples are very simple.
I don't know about you, but I prefer syntax like this:
ÿÿÿ proc F
ÿÿÿ ref void G
ÿÿÿ ref proc H
ÿÿÿ func I -> ref void
It is not the use of a keyword for functions that I disagree with, nor
am I arguing for C's syntax or against your use of "ref" or ordering.ÿ I simply don't think there is much to be gained by using "proc F" instead
of "func F -> void" (assuming that's the right syntax) - or just "func F".
But I think there is quite a bit to be gained if the func/proc
distinction told us something useful and new, rather than just the
existence or lack of a return type.
On 06/11/2024 14:50, David Brown wrote:
On 05/11/2024 23:48, Bart wrote:
On 05/11/2024 13:29, David Brown wrote:
int small_int_sqrt(int x) {
ÿÿÿÿÿif (x == 0) return 0;
ÿÿÿÿÿif (x < 4) return 1;
ÿÿÿÿÿif (x < 9) return 2;
ÿÿÿÿÿif (x < 16) return 3;
ÿÿÿÿÿunreachable();
}
"unreachable()" is a C23 standardisation of a feature found in most
high-end compilers.ÿ For gcc and clang, there is
__builtin_unreachable(), and MSVC has its version.
So it's a kludge.
Cool, I can create one of those too:
ÿfunc smallsqrt(int x)int =
ÿÿÿÿ if
ÿÿÿÿ elsif x=0 thenÿ 0
ÿÿÿÿ elsif x<4 thenÿ 1
ÿÿÿÿ elsif x<9 thenÿ 2
ÿÿÿÿ elsif x<16 then 3
ÿÿÿÿ dummyelseÿÿÿÿÿÿ int.min
ÿÿÿÿ fi
ÿend
'dummyelse' is a special version of 'else' that tells the compiler that control will (should) never arrive there. ATM it does nothing but inform
the reader of that and to remind the author. But later stages of the compiler can choose not to generate code for it, or to generate error-reporting code.
BTW your example lets through negative values; I haven't fixed that.)
This is all a large and complex subject. But it's not really the point
of the discussion.
On 07/11/2024 13:23, Bart wrote:
On 06/11/2024 14:50, David Brown wrote:
On 05/11/2024 23:48, Bart wrote:
On 05/11/2024 13:29, David Brown wrote:
int small_int_sqrt(int x) {
ÿÿÿÿÿif (x == 0) return 0;
ÿÿÿÿÿif (x < 4) return 1;
ÿÿÿÿÿif (x < 9) return 2;
ÿÿÿÿÿif (x < 16) return 3;
ÿÿÿÿÿunreachable();
}
"unreachable()" is a C23 standardisation of a feature found in most
high-end compilers.ÿ For gcc and clang, there is
__builtin_unreachable(), and MSVC has its version.
So it's a kludge.
You mean it is something you don't understand?ÿ Think of this as an opportunity to learn something new.
'dummyelse' is a special version of 'else' that tells the compiler
that control will (should) never arrive there. ATM it does nothing but
inform the reader of that and to remind the author. But later stages
of the compiler can choose not to generate code for it, or to generate
error-reporting code.
You are missing the point - that is shown clearly by the "int.min".
You have your way of doing things, and have no interest in learning
anything else or even bothering to listen or think.
ÿ Your bizarre hatred
of C is overpowering for you
On 02/11/2024 21:44, Bart wrote:
(Note that the '|' is my example is not 'or'; it means 'then':
( c | a ) # these are exactly equivalent
if c then a fi
( c | a | ) # so are these
if c then a else b fi
There is no restriction on what a and b are, statements or
expressions, unless the whole returns some value.)
Ah, so your language has a disastrous choice of syntax here so that
sometimes "a | b" means "or", and sometimes it means "then" or
"implies", and sometimes it means "else".
Why have a second syntax with
a confusing choice of operators when you have a perfectly good "if /
then / else" syntax?
Or if you feel an operator adds a lot to the
language here, why not choose one that would make sense to people, such
as "=>" - the common mathematical symbol for "implies".
Well, it started off as 2-way select, meaning constructs like this:
x = c ? a : b;
x := (c | a | b)
Where one of two branches is evaluated. I extended the latter to N-way select:
x := (n | a, b, c, ... | z)
Where again one of these elements is evaluated, selected by n (here
having the values of 1, 2, 3, ... compared with true, false above, but
there need to be at least 2 elements inside |...| to distinguish them).
I applied it also to other statements that can be provide values, such
as if-elsif chains and switch, but there the selection might be
different (eg. a series of tests are done sequentially until a true one).
I don't know how it got turned into 'multi-way'.
[...]
x := (n | a, b, c, ... | z)
It's a version of Algol68's case construct:
x := CASE n IN a, b, c OUT z ESAC
which also has the same compact form I use. I only use the compact
version because n is usually small, and it is intended to be used within
an expression: print (n | "One", "Two", "Three" | "Other").
On 02/11/2024 21:44, Bart wrote:
(Note that the '|' is my example is not 'or'; it means 'then':
ÿÿÿ (ÿ c |ÿÿÿ a )ÿÿÿÿÿÿÿÿÿ # these are exactly equivalent
ÿÿÿ if c then a fi
ÿÿÿ (ÿ c |ÿÿÿ a |ÿÿÿ b )ÿÿÿÿ # so are these [fixed]
ÿÿÿ if c then a else b fi
There is no restriction on what a and b are, statements or
expressions, unless the whole returns some value.)
Ah, so your language has a disastrous choice of syntax here so that sometimes "a | b" means "or", and sometimes it means "then" or
"implies", and sometimes it means "else".
Why have a second syntax with
a confusing choice of operators when you have a perfectly good "if /
then / else" syntax?
m when it has (*P).m.
Or if you feel an operator adds a lot to the
language here, why not choose one that would make sense to people, such
as "=>" - the common mathematical symbol for "implies".
On 03.11.2024 18:00, David Brown wrote:
On 02/11/2024 21:44, Bart wrote:
(Note that the '|' is my example is not 'or'; it means 'then':
( c | a ) # these are exactly equivalent
if c then a fi
( c | a | ) # so are these
if c then a else b fi
There is no restriction on what a and b are, statements or
expressions, unless the whole returns some value.)
Ah, so your language has a disastrous choice of syntax here so that
sometimes "a | b" means "or", and sometimes it means "then" or
"implies", and sometimes it means "else".
(I can't comment on the "other use" of the same syntax in the
"poster's language" since it's not quoted here.)
But it's not uncommon in programming languages that operators
are context specific, and may mean different things depending
on context.
You are saying "disastrous choice of syntax". - Wow! Hard stuff.
I suggest to cool down before continuing reading further. :-)
Incidentally above syntax is what Algol 68 supports;
Or if you feel an operator adds a lot to the
language here, why not choose one that would make sense to people, such
as "=>" - the common mathematical symbol for "implies".
This is as opinion of course arguable. It's certainly also
influenced where one is coming from (i.e. personal expertise
from other languages).
The detail of what symbols are used is
not that important to me, if it fits to the overall language
design.
From the high-level languages I used in my life I was almost
always "missing" something with conditional expressions. I
don't want separate and restricted syntaxes (plural!) in "C"
(for statements and expressions respectively), for example.
Some are lacking conditional expressions completely. Others
support the syntax with a 'fi' end-terminator and simplify
structures (and add to maintainability) supporting 'else-if'.
And few allow 'if' expressions on the left-hand side of an
assignment. (Algol 68 happens to support everything I need.
Unfortunately it's a language I never used professionally.)
I'm positive that folks who use languages that support those
syntactic forms wouldn't like to miss them. (Me for sure.)
("disastrous syntax" - I'm still laughing... :-)
On 03.11.2024 18:00, David Brown wrote:
or using the respective alternative forms with ( a | b | c) ,
or ( a | b ) where no 'ELSE' is required. (And there's also
the 'ELIF' and the '|:' as alternative form available.)
BTW, the same symbols can also be used as an alternative form
of the 'case' statement; the semantic distinction is made by
context, e.g. the types involved in the construct.
Bart, out of interest; have you invented that syntax for your
language yourself of borrowed it from another language (like
Algol 68)?
On 08/11/2024 17:37, Janis Papanagnou wrote:
BTW, the same symbols can also be used as an alternative form
of the 'case' statement; the semantic distinction is made by
context, e.g. the types involved in the construct.
You mean whether the 'a' in '(a | b... | c)' has type Bool rather than Int?
I've always discriminated on the number of terms between the two |s:
either 1, or more than 1.
Bart, out of interest; have you invented that syntax for your
language yourself of borrowed it from another language (like
Algol 68)?
It was heavily inspired by the syntax (not the semantics) of Algol68,
even though I'd never used it at that point.
I like that it solved the annoying begin-end aspect of Algol60/Pascal
syntax where you have to write the clunky:
[snip examples]
I enhanced it by not needing stropping (and so not allowing embedded
spaces within names); allowing redundant semicolons while at the same
time, turning newlines into semicolons when a line obviously didn't
continue; plus allowing ordinary 'end' or 'end if' to be used as well as 'fi'.
My version then can look like this, a bit less forbidding than Algol68:
if cond then
s1
s2
else
s3
s4
end
On 08/11/2024 18:37, Janis Papanagnou wrote:
The | operator means "or" in the OP's language (AFAIK - only he actually knows the language). So "(a | b | c)" in that language will sometimes
mean the same as "(a | b | c)" in C, and sometimes it will mean the same
as "(a ? b : c)" in C.
There may be some clear distinguishing feature that disambiguates these
uses. But this is a one-man language - there is no need for a clear
syntax or grammar, documentation, consistency in the language, or a consideration for corner cases or unusual uses.
Incidentally above syntax is what Algol 68 supports;
Yes, he said later that Algol 68 was the inspiration for it. Algol 68
was very successful in its day - but there are good reasons why many of
its design choices were been left behind long ago in newer languages.
This is as opinion of course arguable. It's certainly also
influenced where one is coming from (i.e. personal expertise
from other languages).
The language here is "mathematics". I would not expect anyone who even considers designing a programming language to be unfamiliar with that
symbol.
The detail of what symbols are used is
not that important to me, if it fits to the overall language
design.
I am quite happy with the same symbol being used for very different
meanings in different contexts. C's use of "*" for indirection and for multiplication is rarely confusing. Using | for "bitwise or" and also
using it for a "pipe" operator would probably be fine - only one
operation makes sense for the types involved. But here the two
operations - "bitwise or" (or logical or) and "choice" can apply to to
the same types of operands. That's what makes it a very poor choice of syntax.
(For comparison, Algol 68 uses "OR", "∨" or "\/" for the "or" operator, thus it does not have this confusion.)
[...]
I've nothing (much) against the operation - it's the choice of operator
that is wrong.
This was the first part of your example:
const char * flag_to_text_A(bool b) {
if (b == true) {
return "It's true!";
} else if (b == false) {
return "It's false!";
/I/ would question why you'd want to make the second branch conditional
in the first place.
Write an 'else' there, and the issue doesn't arise.
Because I can't see the point of deliberately writing code that usually
takes two paths, when either:
(1) you know that one will never be taken, or
(2) you're not sure, but don't make any provision in case it is
Fix that first rather relying on compiler writers to take care of your
badly written code.
[...]
If you have a function (or construct) that returns a correct value for
inputs 1, 2 and 3, and you never pass it the value 4 (or anything else),
then there is no undefined behaviour no matter what the code looks like
for values other than 1, 2 and 3. If someone calls that function with
input 4, then /their/ code has the error - not the code that doesn't
handle an input 4.
On 06/11/2024 07:26, Kaz Kylheku wrote:
On 2024-11-05, Bart <bc@freeuk.com> wrote:
Well, it started off as 2-way select, meaning constructs like this:
x = c ? a : b;
x := (c | a | b)
Where one of two branches is evaluated. I extended the latter to N-way
select:
x := (n | a, b, c, ... | z)
This looks quite error-prone. You have to count carefully that
the cases match the intended values. If an entry is
inserted, all the remaining ones shift to a higher value.
You've basically taken a case construct and auto-generated
the labels starting from 1.
It's a version of Algol68's case construct:
x := CASE n IN a, b, c OUT z ESAC
which also has the same compact form I use. I only use the compact
version because n is usually small, and it is intended to be used within
an expression: print (n | "One", "Two", "Three" | "Other").
This an actual example (from my first scripting language; not written by
me):
Crd[i].z := (BendAssen |P.x, P.y, P.z)
An out-of-bounds index yields 'void' (via a '| void' part inserted by
the compiler). This is one of my examples from that era:
xt := (messa | 1,1,1, 2,2,2, 3,3,3)
yt := (messa | 3,2,1, 3,2,1, 3,2,1)
Algol68 didn't have 'switch', but I do, as well as a separate
case...esac statement that is more general. Those are better for
multi-line constructs.
As for being error prone because values can get out of step, so is a
function call like this:
f(a, b, c, d, e)
But I also have keyword arguments.
On 04.11.2024 23:25, David Brown wrote:
If you have a function (or construct) that returns a correct value for
inputs 1, 2 and 3, and you never pass it the value 4 (or anything else),
then there is no undefined behaviour no matter what the code looks like
for values other than 1, 2 and 3. If someone calls that function with
input 4, then /their/ code has the error - not the code that doesn't
handle an input 4.
Well, it's a software system design decision whether you want to
make the caller test the preconditions for every function call,
or let the callee take care of unexpected input, or both.
We had always followed the convention to avoid all undefined
situations and always define every 'else' case by some sensible
behavior, at least writing a notice into a log-file, but also
to "fix" the runtime situation to be able to continue operating.
(Note, I was mainly writing server-side software where this was
especially important.)
That's one reason why (as elsethread mentioned) I dislike 'else'
to handle a defined value; I prefer an explicit 'if' and use the
else for reporting unexpected situations (that practically never
appear, or, with the diagnostics QA-evaluated, asymptotically
disappearing).
(For pure binary predicates there's no errors branch, of course.)
Janis
PS: One of my favorite IT-gotchas is the plane crash where the
code specified landing procedure functions for height < 50.0 ft
and for height > 50.0 ft conditions, which mostly worked since
the height got polled only every couple seconds, and the case
height = 50.0 ft happened only very rarely due to the typical
descent characteristics during landing.
On 08.11.2024 23:24, Bart wrote:
On 08/11/2024 17:37, Janis Papanagnou wrote:
BTW, the same symbols can also be used as an alternative form
of the 'case' statement; the semantic distinction is made by
context, e.g. the types involved in the construct.
You mean whether the 'a' in '(a | b... | c)' has type Bool rather than Int? >>
I've always discriminated on the number of terms between the two |s:
either 1, or more than 1.
I suppose in a [historic] "C" like language it's impossible to
distinguish on type here (given that there was no 'bool' type
[in former times] in "C"). - But I'm not quite sure whether
you're speaking here about your "C"-like language or some other
language you implemented.
if cond then
s1
s2
else
s3
s4
end
(Looks a lot more like a scripting language without semicolons.)
On 08.11.2024 19:18, David Brown wrote:
On 08/11/2024 18:37, Janis Papanagnou wrote:
The language here is "mathematics". I would not expect anyone who even
considers designing a programming language to be unfamiliar with that
symbol.
Mathematics, unfortunately, [too] often has several symbols for
the same thing. (It's in that respect not very different from
programming languages, where you can [somewhat] rely on + - * /
but beyond that it's getting more tight.)
Programming languages have the additional problem that you don't
have all necessary symbols available, so language designers have
to map them onto existing symbols. (Also Unicode in modern times
do not solve that fact, since languages typically rely on ASCII,
or some 8-bit extension, at most; full Unicode support, I think,
is rare, especially on the lexical language level. Some allow
them in strings, some in identifiers; but in language keywords?)
BTW, in Algol 68 you can define operators, so you can define
"OP V" or "OP ^" (for 'or' and 'and', respectively, but we cannot
define (e.g.) "OP ú" (a middle dot, e.g. for multiplication).[*]
The detail of what symbols are used is
not that important to me, if it fits to the overall language
design.
I am quite happy with the same symbol being used for very different
meanings in different contexts. C's use of "*" for indirection and for
multiplication is rarely confusing. Using | for "bitwise or" and also
using it for a "pipe" operator would probably be fine - only one
operation makes sense for the types involved. But here the two
operations - "bitwise or" (or logical or) and "choice" can apply to to
the same types of operands. That's what makes it a very poor choice of
syntax.
Well, I'm more used (from mathematics) to 'v' and '^' than to '|'
and '&', respectively. But that doesn't prevent me from accepting
other symbols like '|' to have some [mathematical] meaning, or
even different meanings depending on context. In mathematics it's
not different; same symbols are used in different contexts with
different semantics. (And there's also the mentioned problem of
non-coherent literature WRT used mathematics' symbols.)
(For comparison, Algol 68 uses "OR", "∨" or "\/" for the "or" operator,
thus it does not have this confusion.)
Actually, while I like Algol 68's flexibility, there's in some
cases (to my liking) too many variants. This had partly been
necessary, of course, due to the (even more) restricted character
sets (e.g. 6-bit characters) available in the 1960's.
The two options for conditionals I consider very useful, though,
and it also produces very legible and easily understandable code.
[...]
I've nothing (much) against the operation - it's the choice of operator
that is wrong.
Well, on opinions there's nothing more to discuss, I suppose.
Bart wrote:
On 06/11/2024 07:26, Kaz Kylheku wrote:
On 2024-11-05, Bart <bc@freeuk.com> wrote:
[...] I extended the latter to N-way select:
x := (n | a, b, c, ... | z)
This looks quite error-prone. You have to count carefully that
the cases match the intended values. If an entry is
inserted, all the remaining ones shift to a higher value.
You've basically taken a case construct and auto-generated
the labels starting from 1.
It's a version of Algol68's case construct:
x := CASE n IN a, b, c OUT z ESAC
which also has the same compact form I use. I only use the compact
version because n is usually small, and it is intended to be used within
an expression: print (n | "One", "Two", "Three" | "Other").
[...]
An out-of-bounds index yields 'void' (via a '| void' part inserted by
the compiler). This is one of my examples from that era:
xt := (messa | 1,1,1, 2,2,2, 3,3,3)
yt := (messa | 3,2,1, 3,2,1, 3,2,1)
still the more c compatimle version would look better imo
xt = {1,1,1, 2,2,2, 3,3,3}[messa];
yt = {3,2,1, 3,2,1, 3,2,1}[messa];
[...]
On 09/11/2024 07:54, Janis Papanagnou wrote:
Well, it's a software system design decision whether you want to
make the caller test the preconditions for every function call,
or let the callee take care of unexpected input, or both.
Well, I suppose it is their decision - they can do the right thing, or
the wrong thing, or both.
I believe I explained in previous posts why it is the /caller's/ responsibility to ensure pre-conditions are fulfilled, and why anything
else is simply guaranteeing extra overheads while giving you less
information for checking code correctness. But I realise that could
have been lost in the mass of posts, so I can go through it again if you want.
[...]
(On security boundaries, system call interfaces, etc., where the caller
could be malicious or incompetent in a way that damages something other
than their own program, you have to treat all inputs as dangerous and sanitize them, just like data from external sources. That's a different matter, and not the real focus here.)
We had always followed the convention to avoid all undefined
situations and always define every 'else' case by some sensible
behavior, at least writing a notice into a log-file, but also
to "fix" the runtime situation to be able to continue operating.
(Note, I was mainly writing server-side software where this was
especially important.)
You can't "fix" bugs in the caller code by writing to a log file.
Sometimes you can limit the damage, however.
If you can't trust the people writing the calling code, then that should
be the focus of your development process - find a way to be sure that
the caller code is right. That's where you want your conventions, or to focus code reviews, training, automatic test systems - whatever is appropriate for your team and project. Make sure callers pass correct
data to the function, and the function can do its job properly.
Sometimes it makes sense to specify functions differently, and accept a
wider input. Maybe instead of saying "this function will return the
integer square root of numbers between 0 and 10", you say "this function
will return the integer square root if given a number between 0 and 10,
and will log a message and return -1 for other int values". Fair enough
- now you've got a new function where it is very easy for the caller to ensure the preconditions are satisfied. But be very aware of the costs
- you have now destroyed the "purity" of the function, and lost the key mathematical relation between the input and output. (You have also made everything much less efficient.)
[...]
On 09/11/2024 05:51, Janis Papanagnou wrote:
[...]
Sure, I appreciate all this. We must do the best we can - I am simply
saying that using | for this operation is far from the best choice.
Well, I'm more used (from mathematics) to 'v' and '^' than to '|'
and '&', respectively. But that doesn't prevent me from accepting
other symbols like '|' to have some [mathematical] meaning, or
even different meanings depending on context. In mathematics it's
not different; same symbols are used in different contexts with
different semantics. (And there's also the mentioned problem of
non-coherent literature WRT used mathematics' symbols.)
We are - unfortunately, perhaps - constrained by common keyboards and
ASCII (for the most part). "v" and "^" are poor choices for "or" and
"and" - "∨" and "∧" would be much nicer, but are hard to type.
For
better or worse, the programming world has settled on "|" and "&" as practical alternatives.
("+" and "." are often used in boolean logic,
and can be typed on normal keyboards, but would quickly be confused with other uses of those symbols.)
[...]
Well, on opinions there's nothing more to discuss, I suppose.
Opinions can be justified, and that discussion can be interesting.
Purely subjective opinion is less interesting.
On 05/11/2024 19:53, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
On 05/11/2024 12:42, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
Then we disagree on what 'multi-way' select might mean. I think it means >>>>> branching, even if notionally, on one-of-N possible code paths.
OK.
The whole construct may or may not return a value. If it does, then one >>>>> of the N paths must be a default path.
You need to cover all input values. This is possible when there
is reasonably small number of possibilities. For example, switch on
char variable which covers all possible values does not need default
path. Default is needed only when number of possibilities is too
large to explicitely give all of them. And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.
What's easier to implement in a language: to have a conditional need for >>> an 'else' branch, which is dependent on the compiler performing some
arbitrarily complex levels of analysis on some arbitrarily complex set
of expressions...
...or to just always require 'else', with a dummy value if necessary?
Well, frequently it is easier to do bad job, than a good one.
I assume that you consider the simple solution the 'bad' one?
I'd would consider a much elaborate one putting the onus on external
tools, and still having an unpredictable result to be the poor of the two.
You want to create a language that is easily compilable, no matter how complex the input.
With the simple solution, the worst that can happen is that you have to write a dummy 'else' branch, perhaps with a dummy zero value.
If control never reaches that point, it will never be executed (at
worse, it may need to skip an instruction).
But if the compiler is clever enough (optionally clever, it is not a requirement!), then it could eliminate that code.
A bonus is that when debugging, you can comment out all or part of the previous lines, but the 'else' now catches those untested cases.
normally you do not need very complex analysis:
I don't want to do any analysis at all! I just want a mechanical
translation as effortlessly as possible.
I don't like unbalanced code within a function because it's wrong and
can cause problems.
On 05/11/2024 20:39, Waldek Hebisch wrote:
David Brown <david.brown@hesbynett.no> wrote:
On 05/11/2024 13:42, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
Then we disagree on what 'multi-way' select might mean. I think it means >>>>> branching, even if notionally, on one-of-N possible code paths.
OK.
I appreciate this is what Bart means by that phrase, but I don't agree
with it. I'm not sure if that is covered by "OK" or not!
You may prefer your own definition, but Bart's is resonable one.
The only argument I can make here is that I have not seen "multi-way
select" as a defined phrase with a particular established meaning.
The whole construct may or may not return a value. If it does, then one >>>>> of the N paths must be a default path.
You need to cover all input values. This is possible when there
is reasonably small number of possibilities. For example, switch on
char variable which covers all possible values does not need default
path. Default is needed only when number of possibilities is too
large to explicitely give all of them. And some languages allow
ranges, so that you may be able to cover all values with small
number of ranges.
I think this is all very dependent on what you mean by "all input values". >>>
Supposing I declare this function:
// Return the integer square root of numbers between 0 and 10
int small_int_sqrt(int x);
To me, the range of "all input values" is integers from 0 to 10. I
could implement it as :
int small_int_sqrt(int x) {
if (x == 0) return 0;
if (x < 4) return 1;
if (x < 9) return 2;
if (x < 16) return 3;
unreachable();
}
If the user asks for small_int_sqrt(-10) or small_int_sqrt(20), that's
/their/ fault and /their/ problem. I said nothing about what would
happen in those cases.
But some people seem to feel that "all input values" means every
possible value of the input types, and thus that a function like this
should return a value even when there is no correct value in and no
correct value out.
Well, some languages treat types more seriously than C. In Pascal
type of your input would be 0..10 and all input values would be
handled. Sure, when domain is too complicated to express in type
than it could be documented restriction. Still, it makes sense to
signal error if value goes outside handled rage, so in a sense all
values of input type are handled: either you get valid answer or
clear error.
No, it does not make sense to do that. Just because the C language does
not currently (maybe once C++ gets contracts, C will copy them) have a
way to specify input sets other than by types, does not mean that
functions in C always have a domain matching all possible combinations
of bits in the underlying representation of the parameter's types.
It might be a useful fault-finding aid temporarily to add error messages
for inputs that are invalid but can physically be squeezed into the parameters. That won't stop people making incorrect declarations of the function and passing completely different parameter types to it, or
finding other ways to break the requirements of the function.
And in general there is no way to check the validity of the inputs - you usually have no choice but to trust the caller. It's only in simple
cases, like the example above, that it would be feasible at all.
There are, of course, situations where the person calling the function
is likely to be incompetent, malicious, or both, and where there can be serious consequences for what you might prefer to consider as invalid
input values.
You have that for things like OS system calls - it's no
different than dealing with user inputs or data from external sources.
But you handle that by extending the function - increase the range of
valid inputs and appropriate outputs. You no longer have a function
that takes a number between 0 and 10 and returns the integer square root
- you now have a function that takes a number between -(2 ^ 31 + 1) and
(2 ^ 31) and returns the integer square root if the input is in the
range 0 to 10 or halts the program with an error message for other
inputs in the wider range. It's a different function, with a wider set
of inputs - and again, it is specified to give particular results for particular inputs.
I certainly would
be quite unhappy with code above. It is possible that I would still
use it as a compromise (say if it was desirable to have single
prototype but handle points in spaces of various dimensions),
but my first attempt would be something like:
typedef struct {int p[2];} two_int;
....
I think you'd quickly find that limiting and awkward in C (but it might
be appropriate in other languages).
But don't misunderstand me - I am
all in favour of finding ways in code that make input requirements
clearer or enforceable within the language - never put anything in
comments if you can do it in code. You could reasonably do this in C
for the first example :
// Do not use this directly
extern int small_int_sqrt_implementation(int x);
// Return the integer square root of numbers between 0 and 10
static inline int small_int_sqrt(int x) {
assert(x >= 0 && x <= 10);
return small_int_sqrt_implementation(x);
}
A function should accept all input values - once you have made clear
what the acceptable input values can be. A "default" case is just a
short-cut for conveniently handling a wide range of valid input values - >>> it is never a tool for handling /invalid/ input values.
Well, default can signal error which frequently is right handling
of invalid input values.
Will that somehow fix the bug in the code that calls the function?
It can be a useful debugging and testing aid, certainly, but it does not make the code "correct" or "safe" in any sense.
On 09/11/2024 03:57, Janis Papanagnou wrote:
[...] - But I'm not quite sure whether
you're speaking here about your "C"-like language or some other
language you implemented.
I currently have three HLL implementations:
* For my C subset language (originally I had some enhancements, now
dropped)
* For my 'M' systems language inspired by A68 syntax
* For my 'Q' scripting language, with the same syntax, more or less
The remark was about those last two.
if cond then
s1
s2
else
s3
s4
end
(Looks a lot more like a scripting language without semicolons.)
This is what I've long suspected: that people associate clear, pseudo-code-like syntax with scripting languages.
[...]
On 09.11.2024 12:06, David Brown wrote:
On 09/11/2024 07:54, Janis Papanagnou wrote:
Well, it's a software system design decision whether you want to
make the caller test the preconditions for every function call,
or let the callee take care of unexpected input, or both.
Well, I suppose it is their decision - they can do the right thing, or
the wrong thing, or both.
I believe I explained in previous posts why it is the /caller's/
responsibility to ensure pre-conditions are fulfilled, and why anything
else is simply guaranteeing extra overheads while giving you less
information for checking code correctness. But I realise that could
have been lost in the mass of posts, so I can go through it again if you
want.
I haven't read all the posts, or rather, I just skipped most posts;
it's too time consuming.
Since you explicitly elaborated - thanks! - I will read this one...
[...]
(On security boundaries, system call interfaces, etc., where the caller
could be malicious or incompetent in a way that damages something other
than their own program, you have to treat all inputs as dangerous and
sanitize them, just like data from external sources. That's a different
matter, and not the real focus here.)
We had always followed the convention to avoid all undefined
situations and always define every 'else' case by some sensible
behavior, at least writing a notice into a log-file, but also
to "fix" the runtime situation to be able to continue operating.
(Note, I was mainly writing server-side software where this was
especially important.)
You can't "fix" bugs in the caller code by writing to a log file.
Sometimes you can limit the damage, however.
I spoke more generally of fixing situations (not only bugs).
If you can't trust the people writing the calling code, then that should
be the focus of your development process - find a way to be sure that
the caller code is right. That's where you want your conventions, or to
focus code reviews, training, automatic test systems - whatever is
appropriate for your team and project. Make sure callers pass correct
data to the function, and the function can do its job properly.
Yes.
Sometimes it makes sense to specify functions differently, and accept a
wider input. Maybe instead of saying "this function will return the
integer square root of numbers between 0 and 10", you say "this function
will return the integer square root if given a number between 0 and 10,
and will log a message and return -1 for other int values". Fair enough
- now you've got a new function where it is very easy for the caller to
ensure the preconditions are satisfied. But be very aware of the costs
- you have now destroyed the "purity" of the function, and lost the key
mathematical relation between the input and output. (You have also made
everything much less efficient.)
I disagree in the "much less" generalization. I also think that when
weighing performance versus safety my preferences might be different;
I'm only speaking about a "rule of thumb", not about the actual (IMO) necessity(!) to make this decisions depending on the project context.
David Brown <david.brown@hesbynett.no> wrote:
On 05/11/2024 20:39, Waldek Hebisch wrote:
David Brown <david.brown@hesbynett.no> wrote:
On 05/11/2024 13:42, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
It might be a useful fault-finding aid temporarily to add error messages
for inputs that are invalid but can physically be squeezed into the
parameters. That won't stop people making incorrect declarations of the
function and passing completely different parameter types to it, or
finding other ways to break the requirements of the function.
And in general there is no way to check the validity of the inputs - you
usually have no choice but to trust the caller. It's only in simple
cases, like the example above, that it would be feasible at all.
There are, of course, situations where the person calling the function
is likely to be incompetent, malicious, or both, and where there can be
serious consequences for what you might prefer to consider as invalid
input values.
You apparently exclude possibility of competent persons making a
mistake.
AFAIK industry statistic shows that code develeped by
good developers using rigorous process still contains substantial
number of bugs. So, it makes sense to have as much as possible
verified mechanically. Which in common practice means depending on
type checks. In less common practice you may have some theorem
proving framework checking assertions about input arguments,
then the assertions take role of types.
But don't misunderstand me - I am
all in favour of finding ways in code that make input requirements
clearer or enforceable within the language - never put anything in
comments if you can do it in code. You could reasonably do this in C
for the first example :
// Do not use this directly
extern int small_int_sqrt_implementation(int x);
// Return the integer square root of numbers between 0 and 10
static inline int small_int_sqrt(int x) {
assert(x >= 0 && x <= 10);
return small_int_sqrt_implementation(x);
}
Hmm, why extern implementation and static wrapper? I would do
the opposite.
A function should accept all input values - once you have made clear
what the acceptable input values can be. A "default" case is just a
short-cut for conveniently handling a wide range of valid input values - >>>> it is never a tool for handling /invalid/ input values.
Well, default can signal error which frequently is right handling
of invalid input values.
Will that somehow fix the bug in the code that calls the function?
It can be a useful debugging and testing aid, certainly, but it does not
make the code "correct" or "safe" in any sense.
There is concept of "partial correctness": code if it finishes returns correct value. A variation of this is: code if it finishes without
signaling error returns correct values. Such condition may be
much easier to verify than "full correctness" and in many case
is almost as useful. In particular, mathematicians are _very_
unhappy when program return incorrect results. But they are used
to programs which can not deliver results, either because of
lack or resources or because needed case was not implemented.
When dealing with math formulas there are frequently various
restrictions on parameters, like we can only divide by nonzero
quantity. By signaling error when restrictions are not
satisfied we ensure that sucessful completition means that
restrictions were satisfied. Of course that alone does not
mean that result is correct, but correctness of "general"
case is usually _much_ easier to ensure. In other words,
failing restrictions are major source of errors, and signaling
errors effectively eliminates it.
In world of prefect programmers, they would check restrictions
before calling any function depending on them, or prove that
restrictions on arguments to a function imply correctness of
calls made by the function. But world is imperfect and in
real world extra runtime checks are quite useful.
On 10/11/2024 07:57, Waldek Hebisch wrote:
David Brown <david.brown@hesbynett.no> wrote:
On 05/11/2024 20:39, Waldek Hebisch wrote:
David Brown <david.brown@hesbynett.no> wrote:
On 05/11/2024 13:42, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
Type checks can be extremely helpful, and strong typing greatly reduces
the errors in released code by catching them early (at compile time).
And temporary run-time checks are also helpful during development or debugging.
But extra run-time checks are costly (and I don't mean just in run-time performance, which is only an issue in a minority of situations). They
mean more code - which means more scope for errors, and more code that
must be checked and maintained. Usually this code can't be tested well
in final products - precisely because it is there to handle a situation
that never occurs.
A function should accept all input values - once you have made clear >>>>> what the acceptable input values can be. A "default" case is just a >>>>> short-cut for conveniently handling a wide range of valid input values - >>>>> it is never a tool for handling /invalid/ input values.
Well, default can signal error which frequently is right handling
of invalid input values.
Will that somehow fix the bug in the code that calls the function?
It can be a useful debugging and testing aid, certainly, but it does not >>> make the code "correct" or "safe" in any sense.
There is concept of "partial correctness": code if it finishes returns
correct value. A variation of this is: code if it finishes without
signaling error returns correct values. Such condition may be
much easier to verify than "full correctness" and in many case
is almost as useful. In particular, mathematicians are _very_
unhappy when program return incorrect results. But they are used
to programs which can not deliver results, either because of
lack or resources or because needed case was not implemented.
When dealing with math formulas there are frequently various
restrictions on parameters, like we can only divide by nonzero
quantity. By signaling error when restrictions are not
satisfied we ensure that sucessful completition means that
restrictions were satisfied. Of course that alone does not
mean that result is correct, but correctness of "general"
case is usually _much_ easier to ensure. In other words,
failing restrictions are major source of errors, and signaling
errors effectively eliminates it.
Yes, out-of-band signalling in some way is a useful way to indicate a problem, and can allow parameter checking without losing the useful
results of a function. This is the principle behind exceptions in many languages - then functions either return normally with correct results,
or you have a clearly abnormal situation.
In world of prefect programmers, they would check restrictions
before calling any function depending on them, or prove that
restrictions on arguments to a function imply correctness of
calls made by the function. But world is imperfect and in
real world extra runtime checks are quite useful.
Runtime checks in a function can be useful if you know the calling code might not be perfect and the function is going to take responsibility
for identifying that situation. Programmers will often be writing both
the caller and callee code, and put temporary debugging and test checks wherever it is most convenient.
But I think being too enthusiastic about putting checks in the wrong
place - the callee function - can hide the real problems, or make the
callee code writer less careful about getting their part of the code correct.
Bart <bc@freeuk.com> wrote:
I assume that you consider the simple solution the 'bad' one?
You wrote about _always_ requiring 'else' regardless if it is
needed or not. Yes, I consider this bad.
I'd would consider a much elaborate one putting the onus on external
tools, and still having an unpredictable result to be the poor of the two. >>
You want to create a language that is easily compilable, no matter how
complex the input.
Normally time spent _using_ compiler should be bigger than time
spending writing compiler. If compiler gets enough use, it
justifies some complexity.
I am mainly concerned with clarity and correctness of source code.
Dummy 'else' doing something may hide errors.
Dummy 'else' signaling
error means that something which could be compile time error is
only detected at runtime.
Compiler that detects most errors of this sort is IMO better than
compiler which makes no effort to detect them. And clearly, once
problem is formulated in sufficiently general way, it becomes
unsolvable. So I do not expect general solution, but expect
resonable effort.
normally you do not need very complex analysis:
I don't want to do any analysis at all! I just want a mechanical
translation as effortlessly as possible.
I don't like unbalanced code within a function because it's wrong and
can cause problems.
Well, I demand more from compiler than you do...
David Brown <david.brown@hesbynett.no> wrote:
Runtime checks in a function can be useful if you know the calling code
might not be perfect and the function is going to take responsibility
for identifying that situation. Programmers will often be writing both
the caller and callee code, and put temporary debugging and test checks
wherever it is most convenient.
But I think being too enthusiastic about putting checks in the wrong
place - the callee function - can hide the real problems, or make the
callee code writer less careful about getting their part of the code
correct.
IME the opposite: not having checks in called function simply delays
moment when error is detected. Getting errors early helps focus on
tricky problems or misconceptions. And motivates programmers to
be more careful
Concerning correct place for checks: one could argue that check
should be close to place where the result of check matters, which
frequently is in called function.
And frequently check requires
computation that is done by called function as part of normal
processing, but would be extra code in the caller.
On 11/11/2024 20:09, Waldek Hebisch wrote:
David Brown <david.brown@hesbynett.no> wrote:
Concerning correct place for checks: one could argue that check
should be close to place where the result of check matters, which
frequently is in called function.
No, there I disagree. The correct place for the checks should be close
to where the error is, and that is in the /calling/ code. If the called function is correctly written, reviewed, tested, documented and
considered "finished", why would it be appropriate to add extra code to
that in order to test and debug some completely different part of the code?
The place where the result of the check /really/ matters, is the calling code. And that is also the place where you can most easily find the
error, since the error is in the calling code, not the called function.
And it is most likely to be the code that you are working on at the time
- the called function is already written and tested.
And frequently check requires
computation that is done by called function as part of normal
processing, but would be extra code in the caller.
It is more likely to be the opposite in practice.
And for much of the time, the called function has no real practical way
to check the parameters anyway. A function that takes a pointer
parameter - not an uncommon situation - generally has no way to check
the validity of the pointer. It can't check that the pointer actually points to useful source data or an appropriate place to store data.
All it can do is check for a null pointer, which is usually a fairly
useless thing to do (unless the specifications for the function make the pointer optional). After all, on most (but not all) systems you already have a "free" null pointer check - if the caller code has screwed up and passed a null pointer when it should not have done, the program will
quickly crash when the pointer is used for access. Many compilers
provide a way to annotate function declarations to say that a pointer
must not be null, and can then spot at least some such errors at compile time. And of course the calling code will very often be passing the
address of an object in the call - since that can't be null, a check in
the function is pointless.
Once you get to more complex data structures, the possibility for the
caller to check the parameters gets steadily less realistic.
So now your practice of having functions "always" check their parameters leaves the people writing calling code with a false sense of security - usually you /don't/ check the parameters, you only ever do simple checks that that called could (and should!) do if they were realistic. You've
got the maintenance and cognitive overload of extra source code for your various "asserts" and other check, regardless of any run-time costs
(which are often irrelevant, but occasionally very important).
You will note that much of this - for both sides of the argument - uses words like "often", "generally" or "frequently". It is important to appreciate that programming spans a very wide range of situations, and I don't want to be too categorical about things. I have already said
there are situations when parameter checking in called functions can
make sense. I've no doubt that for some people and some types of
coding, such cases are a lot more common than what I see in my coding.
Note also that when you can use tools to automate checks, such as
"sanitize" options in compilers or different languages that have more in-built checks, the balance differs. You will generally pay a run-time cost for those checks, but you don't have the same kind of source-level costs - your code is still clean, clear, and amenable to correctness checking, without hiding the functionality of the code in a mass of unnecessary explicit checks. This is particularly good for debugging,
and the run-time costs might not be important. (But if run-time costs
are not important, there's a good chance that C is not the best language
to be using in the first place.)
if (n==0) { printf ("n: %u\n",n); n++;}
if (n==1) { printf ("n: %u\n",n); n++;}
if (n==2) { printf ("n: %u\n",n); n++;}
if (n==3) { printf ("n: %u\n",n); n++;}
if (n==4) { printf ("n: %u\n",n); n++;}
printf ("all if completed, n=%u\n",n);
Dan Purgert <dan@djph.net> wrote or quoted:
if (n==0) { printf ("n: %u\n",n); n++;}
if (n==1) { printf ("n: %u\n",n); n++;}
if (n==2) { printf ("n: %u\n",n); n++;}
if (n==3) { printf ("n: %u\n",n); n++;}
if (n==4) { printf ("n: %u\n",n); n++;}
printf ("all if completed, n=%u\n",n);
My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!
In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".
My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!
In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".
Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.
#include <stdio.h>
const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }
void print_english( int const n )
{ printf( "%s\n", english( n )); }
int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }
Dan Purgert <dan@djph.net> wrote or quoted:
if (n==0) { printf ("n: %u\n",n); n++;}
if (n==1) { printf ("n: %u\n",n); n++;}
if (n==2) { printf ("n: %u\n",n); n++;}
if (n==3) { printf ("n: %u\n",n); n++;}
if (n==4) { printf ("n: %u\n",n); n++;}
printf ("all if completed, n=%u\n",n);
My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!
In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".
Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.
#include <stdio.h>
const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }
void print_english( int const n )
{ printf( "%s\n", english( n )); }
int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }
David Brown <david.brown@hesbynett.no> wrote:
On 11/11/2024 20:09, Waldek Hebisch wrote:
David Brown <david.brown@hesbynett.no> wrote:
Concerning correct place for checks: one could argue that check
should be close to place where the result of check matters, which
frequently is in called function.
No, there I disagree. The correct place for the checks should be close
to where the error is, and that is in the /calling/ code. If the called
function is correctly written, reviewed, tested, documented and
considered "finished", why would it be appropriate to add extra code to
that in order to test and debug some completely different part of the code? >>
The place where the result of the check /really/ matters, is the calling
code. And that is also the place where you can most easily find the
error, since the error is in the calling code, not the called function.
And it is most likely to be the code that you are working on at the time
- the called function is already written and tested.
And frequently check requires
computation that is done by called function as part of normal
processing, but would be extra code in the caller.
It is more likely to be the opposite in practice.
And for much of the time, the called function has no real practical way
to check the parameters anyway. A function that takes a pointer
parameter - not an uncommon situation - generally has no way to check
the validity of the pointer. It can't check that the pointer actually
points to useful source data or an appropriate place to store data.
All it can do is check for a null pointer, which is usually a fairly
useless thing to do (unless the specifications for the function make the
pointer optional). After all, on most (but not all) systems you already
have a "free" null pointer check - if the caller code has screwed up and
passed a null pointer when it should not have done, the program will
quickly crash when the pointer is used for access. Many compilers
provide a way to annotate function declarations to say that a pointer
must not be null, and can then spot at least some such errors at compile
time. And of course the calling code will very often be passing the
address of an object in the call - since that can't be null, a check in
the function is pointless.
Well, in a sense pointers are easy: if you do not play nasty tricks
with casts then type checks do significant part of checking. Of
course, pointer may be uninitialized (but compiler warnings help a lot
here), memory may be overwritten, etc. But overwritten memory is
rather special, if you checked that content of memory is correct,
but it is overwritten after the check, then earlier check does not
help. Anyway, main point is ensuring that pointed to data satisfies
expected conditions.
Once you get to more complex data structures, the possibility for the
caller to check the parameters gets steadily less realistic.
So now your practice of having functions "always" check their parameters
leaves the people writing calling code with a false sense of security -
usually you /don't/ check the parameters, you only ever do simple checks
that that called could (and should!) do if they were realistic. You've
got the maintenance and cognitive overload of extra source code for your
various "asserts" and other check, regardless of any run-time costs
(which are often irrelevant, but occasionally very important).
You will note that much of this - for both sides of the argument - uses
words like "often", "generally" or "frequently". It is important to
appreciate that programming spans a very wide range of situations, and I
don't want to be too categorical about things. I have already said
there are situations when parameter checking in called functions can
make sense. I've no doubt that for some people and some types of
coding, such cases are a lot more common than what I see in my coding.
Note also that when you can use tools to automate checks, such as
"sanitize" options in compilers or different languages that have more
in-built checks, the balance differs. You will generally pay a run-time
cost for those checks, but you don't have the same kind of source-level
costs - your code is still clean, clear, and amenable to correctness
checking, without hiding the functionality of the code in a mass of
unnecessary explicit checks. This is particularly good for debugging,
and the run-time costs might not be important. (But if run-time costs
are not important, there's a good chance that C is not the best language
to be using in the first place.)
Our experience differs. As a silly example consider a parser
which produces parse tree. Caller is supposed to pass syntactically
correct string as an argument. However, checking syntactic corretnetness requires almost the same effort as producing parse tree, so it
ususal that parser both checks correctness and produces the result.
I have computations that are quite different than parsing but
in some cases share the same characteristic: checking correctness of arguments requires complex computation similar to producing
actual result. More freqently, called routine can check various
invariants which with high probablity can detect errors. Doing
the same check in caller is inpractical.
Most of my coding is in different languages than C. One of languages
that I use essentially forces programmer to insert checks in
some places. For example unions are tagged and one can use
specific variant only after checking that this is the current
variant. Similarly, fall-trough control structures may lead
to type error at compile time. But signalling error is considered
type safe. So code which checks for unhandled case and signals
errors is accepted as type correct. Unhandled cases frequently
lead to type errors. There is some overhead, but IMO it is accepable.
The language in question is garbage collected, so many memory
related problems go away.
Frequently checks come as natural byproduct of computations. When
handling tree like structures in C IME usualy simplest code code
is reqursive with base case being the null pointer. When base
case should not occur we get check instead of computation.
Skipping such checks also put cognitive load on the reader:
normal pattern has corresponding case, so reader does not know
if the case was ommited by accident or it can not occur. Comment
may clarify this, but error check is equally clear.
On 16/11/2024 09:42, Stefan Ram wrote:
Dan Purgert <dan@djph.net> wrote or quoted:
if (n==0) { printf ("n: %u\n",n); n++;}
if (n==1) { printf ("n: %u\n",n); n++;}
if (n==2) { printf ("n: %u\n",n); n++;}
if (n==3) { printf ("n: %u\n",n); n++;}
if (n==4) { printf ("n: %u\n",n); n++;}
printf ("all if completed, n=%u\n",n);
ÿÿ My bad if the following instruction structure's already been hashed
ÿÿ out in this thread, but I haven't been following the whole convo!
ÿÿ In my C 101 classes, after we've covered "if" and "else",
ÿÿ I always throw this program up on the screen and hit the newbies
ÿÿ with this curveball: "What's this bad boy going to spit out?".
FGS please turn down the 'hip lingo' generator down a few notches!
On Sat, 16 Nov 2024 09:42:49 +0000, Stefan Ram wrote:
Dan Purgert <dan@djph.net> wrote or quoted:
if (n==0) { printf ("n: %u\n",n); n++;}
if (n==1) { printf ("n: %u\n",n); n++;}
if (n==2) { printf ("n: %u\n",n); n++;}
if (n==3) { printf ("n: %u\n",n); n++;}
if (n==4) { printf ("n: %u\n",n); n++;}
printf ("all if completed, n=%u\n",n);
My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!
In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".
Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.
#include <stdio.h>
const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }
void print_english( int const n )
{ printf( "%s\n", english( n )); }
int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }
If I read your code correctly, you have actually included not one,
but TWO curveballs. Well done!
On 10/11/2024 06:00, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
I'd would consider a much elaborate one putting the onus on external
tools, and still having an unpredictable result to be the poor of the two. >>>
You want to create a language that is easily compilable, no matter how
complex the input.
Normally time spent _using_ compiler should be bigger than time
spending writing compiler. If compiler gets enough use, it
justifies some complexity.
That doesn't add up: the more the compiler gets used, the slower it
should get?!
The sort of analysis you're implying I don't think belongs in the kind
of compiler I prefer. Even if it did, it would be later on in the
process than the point where the above restriction is checked, so
wouldn't exist in one of my compilers anyway.
I don't like open-ended tasks like this where compilation time could end
up being anything. If you need to keep recompiling the same module, then
you don't want to repeat that work each time.
I am mainly concerned with clarity and correctness of source code.
So am I. I try to keep my syntax clean and uncluttered.
Dummy 'else' doing something may hide errors.
So can 'unreachable'.
Dummy 'else' signaling
error means that something which could be compile time error is
only detected at runtime.
Compiler that detects most errors of this sort is IMO better than
compiler which makes no effort to detect them. And clearly, once
problem is formulated in sufficiently general way, it becomes
unsolvable. So I do not expect general solution, but expect
resonable effort.
So how would David Brown's example work:
int F(int n) {
if (n==1) return 10;
if (n==2) return 20;
}
/You/ know that values -2**31 to 0 and 3 to 2**31-1 are impossible; the compiler doesn't. It's likely to tell you that you may run into the end
of the function.
So what do you want the compiler to here? If I try it:
func F(int n)int =
if n=1 then return 10 fi
if n=2 then return 20 fi
end
It says 'else needed' (in that last statement). I can also shut it up
like this:
func F(int n)int = # int is i64 here
if n=1 then return 10 fi
if n=2 then return 20 fi
0
end
Since now that last statement is the '0' value (any int value wil do).
What should my compiler report instead? What analysis should it be
doing? What would that save me from typing?
normally you do not need very complex analysis:
I don't want to do any analysis at all! I just want a mechanical
translation as effortlessly as possible.
I don't like unbalanced code within a function because it's wrong and
can cause problems.
Well, I demand more from compiler than you do...
Perhaps you're happy for it to be bigger and slower too. Most of my
projects build more or less instantly. Here 'ms' is a version that runs programs directly from source (the first 'ms' is 'ms.exe' and subsequent ones are 'ms.m' the lead module):
c:\bx>ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms hello
Hello World! 21:00:45
This builds and runs 15 successive generations of itself in memory
before building and running hello.m; it took 1 second in all. (Now try
that with gcc!)
Here:
c:\cx>tm \bx\mm -runp cc sql
Compiling cc.m to <pcl>
Compiling sql.c to sql.exe
This compiles my C compiler from source but then it /interprets/ the IR produced. This interpreted compiler took 6 seconds to build the 250Kloc
test file, and it's a very slow interpreter (it's used for testing and debugging).
(gcc -O0 took a bit longer to build sql.c! About 7 seconds but it is
using a heftier windows.h.)
If I run the C compiler from source as native code (\bx\ms cc sql) then building the compiler *and* sql.c takes 1/3 of a second.
You can't do this stuff with the compilers David Brown uses; I'm
guessing you can't do it with your prefered ones either.
[...]
My preferences are very much weighted towards correctness, not
efficiency. That includes /knowing/ that things are correct, not just passing some tests. [...]
I wonder what happened to Stefan. He used to make perfectly good posts.
Then he disappeared for a bit, and came back with this new "style".
Given that this "new" Stefan can write posts with interesting C content,
such as this one, and has retained his ugly coding layout and
non-standard Usenet format, I have to assume it's still the same person behind the posts.
On 10.11.2024 16:13, David Brown wrote:
[...]
My preferences are very much weighted towards correctness, not
efficiency. That includes /knowing/ that things are correct, not just
passing some tests. [...]
I agree with you. But given what you write I'm also sure you know
what's achievable in theory, what's an avid wish, and what's really
possible.
Yet there's also projects that don't seem to care, where
speedy delivery is the primary goal. Guaranteeing formal correctness
had never been an issue in the industry contexts I worked in, and I
was always glad when I had a good test environment, with a good test coverage, and continuous refinement of tests. Informal documentation,
factual checks of the arguments, and actual tests was what kept the
quality of our project deliveries at a high level.
On 16.11.2024 17:38, David Brown wrote:
I wonder what happened to Stefan. He used to make perfectly good posts.
Then he disappeared for a bit, and came back with this new "style".
Given that this "new" Stefan can write posts with interesting C content,
such as this one, and has retained his ugly coding layout and
non-standard Usenet format, I have to assume it's still the same person
behind the posts.
Sorry that I cannot resist asking what you consider "non-standard
Usenet format", given that your posts don't consider line length.
(Did the "standards" change during the past three decades maybe?
Do we use only those parts of the "standards" that we like and
ignore others? Or does it boil down to Netiquette is no standard?)
Janis, just curious and no offense intended :-)
On 16.11.2024 17:38, David Brown wrote:
I wonder what happened to Stefan. He used to make perfectly good
posts. Then he disappeared for a bit, and came back with this new
"style".
Given that this "new" Stefan can write posts with interesting C
content, such as this one, and has retained his ugly coding layout
and non-standard Usenet format, I have to assume it's still the
same person behind the posts.
Sorry that I cannot resist asking what you consider "non-standard
Usenet format", given that your posts don't consider line length.
(Did the "standards" change during the past three decades maybe?
Do we use only those parts of the "standards" that we like and
ignore others? Or does it boil down to Netiquette is no standard?)
There are a great variety of projects, [...]
Of course testing is important, at many levels. But the time to test
your code is when you are confident that it is correct - testing is not
an alternative to writing code that is as clearly correct as you are
able to make it.
On 11/16/24 04:42, Stefan Ram wrote:
...
[...]
#include <stdio.h>
const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }
void print_english( int const n )
{ printf( "%s\n", english( n )); }
int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }
Nice. It did take a little while for me to figure out what was wrong,
but since I knew that something was wrong, I did eventually find it -
without first running the program.
Bart <bc@freeuk.com> wrote:
On 10/11/2024 06:00, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
I'd would consider a much elaborate one putting the onus on external
tools, and still having an unpredictable result to be the poor of the two. >>>>
You want to create a language that is easily compilable, no matter how >>>> complex the input.
Normally time spent _using_ compiler should be bigger than time
spending writing compiler. If compiler gets enough use, it
justifies some complexity.
That doesn't add up: the more the compiler gets used, the slower it
should get?!
More complicated does not mean slower. Binary search or hash tables
are more complicated than linear search, but for larger data may
be much faster.
More generaly, I want to minimize time spent by the programmer,
that is _sum over all iterations leading to correct program_ of
compile time and "think time". Compiler that compiles slower,
but allows less iterations due to better diagnostics may win.
Also, humans perceive 0.1s delay almost like no delay at all.
So it does not matter if single compilation step is 0.1s or
0.1ms. Modern computers can do a lot of work in 0.1s.
Yes. This may lead to some complexity. Simple approach is to
avoid obviously useless recompilation ('make' is doing this).
More complicated approach may keep some intermediate data and
try to "validate" them first. If previous analysis is valid,
then it can be reused. If something significant changes, than
it needs to be re-done. But many changes only have very local
effect, so at least theoretically re-using analyses could
save substantial time.
Since now that last statement is the '0' value (any int value wil do).
What should my compiler report instead? What analysis should it be
doing? What would that save me from typing?
Currently in typed language that I use literal translation of
the example hits a hole in checks, that is the code is accepted.
Concerning needed analyses: one thing needed is representation of
type, either Pascal range type or enumeration type (the example
is _very_ unatural because in modern programming magic numbers
are avoided and there would be some symbolic representation
adding meaning to the numbers). Second, compiler must recognize
that this is a "multiway switch" and collect conditions.
Once
you have such representation (which may be desirable for other
reasons) it is easy to determine set of handled values. More
precisely, in this example we just have small number of discrete
values. More ambitious compiler may have list of ranges.
If type also specifies list of values or list of ranges, then
it is easy to check if all values of the type are handled.
You can't do this stuff with the compilers David Brown uses; I'm
guessing you can't do it with your prefered ones either.
To recompile the typed system I use (about 0.4M lines) on new fast
machine I need about 53s. But that is kind of cheating:
- this time is for parallel build using 20 logical cores
- the compiler is not in the language it compiles (but in untyped
vesion of it)
- actuall compilation of the compiler is small part of total
compile time
On slow machine compile time can be as large as 40 minutes.
An untyped system that I use has about 0.5M lines and recompiles
itself in 16s on the same machine. This one uses single core.
On slow machine compile time may be closer to 2 minutes.
Again, compiler compile time is only a part of build time.
Actualy, one time-intensive part is creating index for included documentation.
Another is C compilation for a library file
(system has image-processing functions and low-level part of
image processing is done in C). Recomplation starts from
minimal version of the system, rebuilding this minimal
version takes 3.3s.
Anyway, I do not need cascaded recompilation than you present.
Both system above have incermental compilation, the second one
at statement/function level: it offers interactive prompt
which takes a statement from the user, compiles it and immediately
executes. Such statement may define a function or perform compilation.
Even on _very_ slow machine there is no noticable delay due to
compilation, unless you feed the system with some oversized statement
or function (presumably from a file).
An untyped system
On 19/11/2024 01:53, Waldek Hebisch wrote:
More complicated does not mean slower. Binary search or hash tables
are more complicated than linear search, but for larger data may
be much faster.
That's not the complexity I had in mind. The 100-200MB sizes of
LLVM-based compilers are not because they use hash-tables over linear >search.
My tools can generally build my apps from scratch in 0.1 seconds; big >compilers tend to take a lot longer. Only Tiny C is in that ballpark.
Bart <bc@freeuk.com> writes:
On 19/11/2024 01:53, Waldek Hebisch wrote:
More complicated does not mean slower. Binary search or hash tables
are more complicated than linear search, but for larger data may
be much faster.
That's not the complexity I had in mind. The 100-200MB sizes of
LLVM-based compilers are not because they use hash-tables over linear
search.
You still have this irrational obsession with the amount of disk
space consumed by a compiler suite - one that is useful to a massive
number of developers (esp. compared with the user-base of your
compiler).
The amount of disk space consumed by a compilation suite is
a meaningless statistic. 10MByte disks are a relic of the
distant past.
My tools can generally build my apps from scratch in 0.1 seconds; big
compilers tend to take a lot longer. Only Tiny C is in that ballpark.
And Tiny C is useless for the majority of real-world applications.
How many people are using your compiler to build production applications?
On 19.11.2024 09:19, David Brown wrote:
[...]
There are a great variety of projects, [...]
I don't want the theme to get out of hand, so just one amendment to...
Of course testing is important, at many levels. But the time to test
your code is when you are confident that it is correct - testing is not
an alternative to writing code that is as clearly correct as you are
able to make it.
Sound like early days practice, where code is written, "defined" at
some point as "correct", and then tests written (sometimes written
by the same folks who implemented the code) to prove that the code
is doing the expected, or the tests have been spared because it was
"clear" that the code is "correct" (sort of).
Since the 1990's we've had other principles, yes, "on many levels"
(as you started your paragraph). At all levels there's some sort of specification (or description) that defined the expected outcome
and behavior; tests [of levels higher than unit-tests] are written
if not in parallel then usually by separate groups. The decoupling
is important, the "first implement, then test" serializing certainly
not.
Of course every responsible programmer tries to create correct code, supported by own experience and by projects' regulatory means. But
that doesn't guarantee correct code. Neither do test guarantee that.
But tests have been, IME, more effective in supporting correctness
than being "confident that it is correct" (as you say).
On 19/11/2024 01:53, Waldek Hebisch wrote:
Another example, building 40Kloc interpreter from source then running it
in memory:
ÿ c:\qx>tm \bx\mm -run qq hello
ÿ Compiling qq.m to memory
ÿ Hello, World! 19-Nov-2024 15:38:47
ÿ TM: 0.11
ÿ c:\qx>tm qq hello
ÿ Hello, World! 19-Nov-2024 15:38:49
ÿ TM: 0.05
The second version runs a precompiled EXE. So building from source added only 90ms.
On 10/11/2024 06:00, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
I'd would consider a much elaborate one putting the onus on external
tools, and still having an unpredictable result to be the poor of the
two.
You want to create a language that is easily compilable, no matter how
complex the input.
Normally time spent _using_ compiler should be bigger than time
spending writing compiler.ÿ If compiler gets enough use, it
justifies some complexity.
That doesn't add up: the more the compiler gets used, the slower it
should get?!
On 19/11/2024 01:53, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
On 10/11/2024 06:00, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
I'd would consider a much elaborate one putting the onus on external >>>>> tools, and still having an unpredictable result to be the poor of the two.
You want to create a language that is easily compilable, no matter how >>>>> complex the input.
Normally time spent _using_ compiler should be bigger than time
spending writing compiler. If compiler gets enough use, it
justifies some complexity.
That doesn't add up: the more the compiler gets used, the slower it
should get?!
More complicated does not mean slower. Binary search or hash tables
are more complicated than linear search, but for larger data may
be much faster.
That's not the complexity I had in mind. The 100-200MB sizes of
LLVM-based compilers are not because they use hash-tables over linear search.
More generaly, I want to minimize time spent by the programmer,
that is _sum over all iterations leading to correct program_ of
compile time and "think time". Compiler that compiles slower,
but allows less iterations due to better diagnostics may win.
Also, humans perceive 0.1s delay almost like no delay at all.
So it does not matter if single compilation step is 0.1s or
0.1ms. Modern computers can do a lot of work in 0.1s.
What's the context of this 0.1 seconds? Do you consider it long or short?
My tools can generally build my apps from scratch in 0.1 seconds; big compilers tend to take a lot longer. Only Tiny C is in that ballpark.
So I'm failing to see your point here. Maybe you picked up that 0.1
seconds from an earlier post of mine and are suggesting I ought to be
able to do a lot more analysis within that time?
Yes. This may lead to some complexity. Simple approach is to
avoid obviously useless recompilation ('make' is doing this).
More complicated approach may keep some intermediate data and
try to "validate" them first. If previous analysis is valid,
then it can be reused. If something significant changes, than
it needs to be re-done. But many changes only have very local
effect, so at least theoretically re-using analyses could
save substantial time.
I consider compilation: turning textual source code into a form that can
be run, typically binary native code, to be a completely routine task
that should be as simple and as quick as flicking a light switch.
While anything else that might be a deep analysis of that program I
consider to be a quite different task. I'm not saying there is no place
for it, but I don't agree it should be integrated into every compiler
and always invoked.
Since now that last statement is the '0' value (any int value wil do).
What should my compiler report instead? What analysis should it be
doing? What would that save me from typing?
Currently in typed language that I use literal translation of
the example hits a hole in checks, that is the code is accepted.
Concerning needed analyses: one thing needed is representation of
type, either Pascal range type or enumeration type (the example
is _very_ unatural because in modern programming magic numbers
are avoided and there would be some symbolic representation
adding meaning to the numbers). Second, compiler must recognize
that this is a "multiway switch" and collect conditions.
The example came from C. Even if written as a switch, C switches do not return values (and also are hard to even analyse as to which branch is which).
In my languages, switches can return values, and a switch written as the last statement of a function is considered to do so, even if each branch uses an explicit 'return'. Then, it will consider a missing ELSE a 'hole'.
It will not do any analysis of the range other than what is necessary to implement switch (duplicate values, span of values, range-checking when using jump tables).
So the language may require you to supply a dummy 'else x' or 'return
x'; so what?
The alternative appears to be one of:
* Instead of 'else' or 'return', to write 'unreachable', which puts some
trust, not in the programmer, but some person calling your function
who does not have sight of the source code, to avoid calling it with
invalid arguments
Once
you have such representation (which may be desirable for other
reasons) it is easy to determine set of handled values. More
precisely, in this example we just have small number of discrete
values. More ambitious compiler may have list of ranges.
If type also specifies list of values or list of ranges, then
it is easy to check if all values of the type are handled.
The types are tyically plain integers, with ranges from 2**8 to 2**64.
The ranges associated with application needs will be more arbitrary.
If talking about a language with ranged integer types, then there might
be more point to it, but that is itself a can of worms. (It's hard to do without getting halfway to implementing Ada.)
You can't do this stuff with the compilers David Brown uses; I'm
guessing you can't do it with your prefered ones either.
To recompile the typed system I use (about 0.4M lines) on new fast
machine I need about 53s. But that is kind of cheating:
- this time is for parallel build using 20 logical cores
- the compiler is not in the language it compiles (but in untyped
vesion of it)
- actuall compilation of the compiler is small part of total
compile time
On slow machine compile time can be as large as 40 minutes.
40 minutes for 400K lines? That's 160 lines per second; how old is this machine? Is the compiler written in Python?
An untyped system that I use has about 0.5M lines and recompiles
itself in 16s on the same machine. This one uses single core.
On slow machine compile time may be closer to 2 minutes.
So 4K to 30Klps.
Again, compiler compile time is only a part of build time.
Actualy, one time-intensive part is creating index for included
documentation.
Which is not going to be part of a routine build.
Another is C compilation for a library file
(system has image-processing functions and low-level part of
image processing is done in C). Recomplation starts from
minimal version of the system, rebuilding this minimal
version takes 3.3s.
My language tools work on a whole program, where a 'program' is a single
EXE or DLL file (or a single OBJ file in some cases).
A 'build' then turns N source files into 1 binary file. This is the task
I am talking about.
A complete application may have several such binaries and a bunch of
other stuff. Maybe some source code is generated by a script. This part
is open-ended.
However each of my current projects is a single, self-contained binary
by design.
Anyway, I do not need cascaded recompilation than you present.
Both system above have incermental compilation, the second one
at statement/function level: it offers interactive prompt
which takes a statement from the user, compiles it and immediately
executes. Such statement may define a function or perform compilation.
Even on _very_ slow machine there is no noticable delay due to
compilation, unless you feed the system with some oversized statement
or function (presumably from a file).
This sounds like a REPL system. There, each line is a new part of the program which is processed, executed and discarded.
In that regard, it
is not really what I am talking about, which is AOT compilation of a
program represented by a bunch of source files.
Or can a new line redefine something, perhaps a function definition, previously entered amongst the last 100,000 lines? Can a new line
require compilation of something typed 50,000 lines ago?
What happens if you change the type of a global; are you saying that
none of the program codes needs revising?
An untyped system
What do you mean by an untyped system? To me it usually means
dynamically typed.
On 19/11/2024 15:51, Bart wrote:
On 19/11/2024 01:53, Waldek Hebisch wrote:
Another example, building 40Kloc interpreter from source then running it
in memory:
ÿ c:\qx>tm \bx\mm -run qq hello
ÿ Compiling qq.m to memory
ÿ Hello, World! 19-Nov-2024 15:38:47
ÿ TM: 0.11
ÿ c:\qx>tm qq hello
ÿ Hello, World! 19-Nov-2024 15:38:49
ÿ TM: 0.05
The second version runs a precompiled EXE. So building from source added
only 90ms.
Sorry, that should be 60ms. Running that interpreter from source only
takes 1/16th of a second longer not 1/11th of a second.
BTW I didn't remark on the range of your (WH's) figures. They spanned 40 minutes for a build to instant, but it's not clear for which languages
they are, which tools are used and which machines. Or how much work they
have to do to get those faster times, or what work they don't do: I'm guessing it's not processing 0.5M lines for that fastest time.
So it was hard to formulate a response.
All my timings are either for C or my systems language, running on one
core on the same PC.
Bart <bc@freeuk.com> wrote:
It is related: both gcc anf LLVM are doing analyses that in the
past were deemed inpracticaly expensive (both in time and in space).
Those analyses work now thanks to smart algorithms that
significantly reduced resource usage. I know that you consider
this too expensive.
What's the context of this 0.1 seconds? Do you consider it long or short?
Context is interactive response. It means "pretty fast for interactive
use".
My tools can generally build my apps from scratch in 0.1 seconds; big
compilers tend to take a lot longer. Only Tiny C is in that ballpark.
So I'm failing to see your point here. Maybe you picked up that 0.1
seconds from an earlier post of mine and are suggesting I ought to be
able to do a lot more analysis within that time?
This 0.1s is old thing. My point is that if you are compiling simple
change, than you should be able to do more in this time. In normal developement source file bigger than 10000 lines are relatively
rare, so once you get in range of 50000-100000 lines per second
making compiler faster is of marginal utility.
We clearly differ in question of what is routine. Creating usable
executable is rare task, once executable is created it can be used
for long time. OTOH developement is routine and for this one wants
to know if a change is correct.
Already simple thing would be an improvement: make compiler aware of
error routine (if you do not have it add one) so that when you
signal error compiler will know that there is no need for normal
return value.
Which is not going to be part of a routine build.
In a sense build is not routine. Build is done for two purposes:
- to install working system from sources, that includes
documentaion
- to check that build works properly after changes, this also
should check documentaion build.
Normal developement goes without rebuilding the system.
I know. But this is not what I do. Build produces mutiple
artifacts, some of them executable, some are loadable code (but _not_
in form recogized by operating system), some essentially non-executable
(like documentation).
This sounds like a REPL system. There, each line is a new part of the
program which is processed, executed and discarded.
First, I am writing about two different systems. Both have REPL.
Lines typed at REPL are "discarded", but their effect may last
long time.
What happens if you change the type of a global; are you saying that
none of the program codes needs revising?
In typed system there are no global "library" variables, all data
is encapsulated in modules and normally accessed in abstract way,
by calling apropriate functions. So, in "clean" code you
can recompile a single module and the whole system works.
Bart <bc@freeuk.com> wrote:
BTW I didn't remark on the range of your (WH's) figures. They spanned 40
minutes for a build to instant, but it's not clear for which languages
they are, which tools are used and which machines. Or how much work they
have to do to get those faster times, or what work they don't do: I'm
guessing it's not processing 0.5M lines for that fastest time.
As I wrote, there are 2 different system, if interesed you can fetch
them from github.
I do not think I will use your system language. And for C compiler
at least currently it does not make big difference to me if your
compiler can do 1Mloc or 5Mloc on my machine, both are "pretty fast".
What matters more is support of debugging output, supporting
targets that I need (like ARM or Risc-V), good diagnostics
and optimization.
I recently installed TinyC on small Risc-V
machine, I think that available memory (64MB all, about 20MB available
to user programs) is too small to run gcc or clang.
Dan Purgert <dan@djph.net> wrote or quoted:
if (n==0) { printf ("n: %u\n",n); n++;}
if (n==1) { printf ("n: %u\n",n); n++;}
if (n==2) { printf ("n: %u\n",n); n++;}
if (n==3) { printf ("n: %u\n",n); n++;}
if (n==4) { printf ("n: %u\n",n); n++;}
printf ("all if completed, n=%u\n",n);
My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!
In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".
Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.
#include <stdio.h>
const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }
void print_english( int const n )
{ printf( "%s\n", english( n )); }
int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }
On 19/11/2024 23:41, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
It's funny how nobody seems to care about the speed of compilers (which
can vary by 100:1), but for the generated programs, the 2:1 speedup you >might get by optimising it is vital!
Here I might borrow one of your arguments and suggest such a speed-up is >only necessary on a rare production build.
I recently installed TinyC on small Risc-V
machine, I think that available memory (64MB all, about 20MB available
to user programs) is too small to run gcc or clang.
Only 20,000KB? My first compilers worked on 64KB systems, not all of
which was available either.
None of my recent products will do so now, but they will still fit on a >floppy disk.
On 19/11/2024 23:41, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
BTW I didn't remark on the range of your (WH's) figures. They spanned 40 >>> minutes for a build to instant, but it's not clear for which languages
they are, which tools are used and which machines. Or how much work they >>> have to do to get those faster times, or what work they don't do: I'm
guessing it's not processing 0.5M lines for that fastest time.
As I wrote, there are 2 different system, if interesed you can fetch
them from github.
Do you have a link? Probably I won't attempt to build but I can see what
it looks like.
I do not think I will use your system language. And for C compiler
at least currently it does not make big difference to me if your
compiler can do 1Mloc or 5Mloc on my machine, both are "pretty fast".
What matters more is support of debugging output, supporting
targets that I need (like ARM or Risc-V), good diagnostics
and optimization.
It's funny how nobody seems to care about the speed of compilers (which
can vary by 100:1), but for the generated programs, the 2:1 speedup you might get by optimising it is vital!
Here I might borrow one of your arguments and suggest such a speed-up is only necessary on a rare production build.
I recently installed TinyC on small Risc-V
machine, I think that available memory (64MB all, about 20MB available
to user programs) is too small to run gcc or clang.
Only 20,000KB? My first compilers worked on 64KB systems, not all of
which was available either.
None of my recent products will do so now, but they will still fit on a floppy disk.
BTW why don't you use a cross-compiler? That's what David Brown would say.
Bart <bc@freeuk.com> writes:
On 19/11/2024 23:41, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
It's funny how nobody seems to care about the speed of compilers (which
can vary by 100:1), but for the generated programs, the 2:1 speedup you
might get by optimising it is vital!
I don't consider it funny at all, rather it is simply the way things
should be. One compiles once.
One's customer runs the resulting
executable perhaps millions of times.
Here I might borrow one of your arguments and suggest such a speed-up is
only necessary on a rare production build.
And again, you've clearly never worked with any significantly
large project. Like for instance an operating system.
machine, I think that available memory (64MB all, about 20MB available
to user programs) is too small to run gcc or clang.
Only 20,000KB? My first compilers worked on 64KB systems, not all of
which was available either.
My first compilers worked on 4KW PDP-8. Not that I have any
interest in _ever_ working in such a constrained environment
ever again.
None of my recent products will do so now, but they will still fit on a
floppy disk.
And, nobody cares.
On 19/11/2024 22:40, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
It is related: both gcc anf LLVM are doing analyses that in the
past were deemed inpracticaly expensive (both in time and in space).
Those analyses work now thanks to smart algorithms that
significantly reduced resource usage. I know that you consider
this too expensive.
How long would LLVM take to compile itself on one core? (Here I'm not
even sure what LLVM is; it you download the binary, it's about 2.5GB,
but a typical LLVM compiler might 100+ MB. But I guess it will be while
in either case.)
I have product now that is like a mini-LLVM backend. It can build into a standalone library of under 0.2MB, which can directy produce EXEs, or it
can interpret. Building that product from scratch takes 60ms.
That is my kind of product
What's the context of this 0.1 seconds? Do you consider it long or short? >>Context is interactive response. It means "pretty fast for interactive
use".
It's less than the time to press and release the Enter key.
My tools can generally build my apps from scratch in 0.1 seconds; big
compilers tend to take a lot longer. Only Tiny C is in that ballpark.
So I'm failing to see your point here. Maybe you picked up that 0.1
seconds from an earlier post of mine and are suggesting I ought to be
able to do a lot more analysis within that time?
This 0.1s is old thing. My point is that if you are compiling simple
change, than you should be able to do more in this time. In normal
developement source file bigger than 10000 lines are relatively
rare, so once you get in range of 50000-100000 lines per second
making compiler faster is of marginal utility.
I *AM* doing more in that time! It just happens to be stuff you appear
to have no interest in:
* I write whole-program compilers: you always process all source files
of an application. The faster the compiler, the bigger the scale of app
it becomes practical on.
* That means no headaches with dependencies (it goes in hand with a
decent module scheme)
* I can change one tiny corner of a the program, say add an /optional/ argument to a function, which requires compiling all call-sites across
the program, and the next compilation will take care of everything
* If I were to do more with optimisation (there is lots that can be done without getting into the heavy stuff), it automatically applies to the
whole program
* I can choose to run applications from source code, without generating discrete binary files, just like a script language
* I can choose (with my new backend) to interpret programs in this
static language. (Interpretation gives better debugging opportunities)
* I don't need to faff around with object files or linkers
Module-based independent compilation and having to link 'object files'
is stone-age stuff.
We clearly differ in question of what is routine. Creating usable
executable is rare task, once executable is created it can be used
for long time. OTOH developement is routine and for this one wants
to know if a change is correct.
I take it then that you have some other way of doing test runs of a
program without creating an executable?
It's difficult to tell from your comments.
Already simple thing would be an improvement: make compiler aware of
error routine (if you do not have it add one) so that when you
signal error compiler will know that there is no need for normal
return value.
OK, but what does that buy me? Saving a few bytes for a return
instruction in a function? My largest program, which is 0.4MB, already
only occupies 0.005% of the machines 8GB.
Which is not going to be part of a routine build.
In a sense build is not routine. Build is done for two purposes:
- to install working system from sources, that includes
documentaion
- to check that build works properly after changes, this also
should check documentaion build.
Normal developement goes without rebuilding the system.
We must be talking at cross-purposes then.
Either you're developing using interpreted code, or you must have some
means of converting source code to native code, but for some reason you don't use 'compile' or 'build' to describe that process.
Or maybe your REPL/incremental process can run for days doing
incremental changes without doing a full compile.
It seems quite mysterious.
I might run my compiler hundreds of times a day (at 0.1 seconds a time,
600 builds would occupy one whole minute in the day!). I often do it for frivolous purposes, such as trying to get some output lined up just
right. Or just to make sure something has been recompiled since it's so quick it's hard to tell.
I know. But this is not what I do. Build produces mutiple
artifacts, some of them executable, some are loadable code (but _not_
in form recogized by operating system), some essentially non-executable
(like documentation).
So, 'build' means something different to you. I use 'build' just as a
change from writing 'compile'.
This sounds like a REPL system. There, each line is a new part of the
program which is processed, executed and discarded.
First, I am writing about two different systems. Both have REPL.
Lines typed at REPL are "discarded", but their effect may last
long time.
My last big app used a compiled core but most user-facing functionality
was done using an add-on script language. This meant I could develop
such modules from within a working application, which provided a rich, persistent environment.
Changes to the core program required a rebuild and a restart.
However the whole thing was an application, not a language.
What happens if you change the type of a global; are you saying that
none of the program codes needs revising?
In typed system there are no global "library" variables, all data
is encapsulated in modules and normally accessed in abstract way,
by calling apropriate functions. So, in "clean" code you
can recompile a single module and the whole system works.
I used module-at-time compilation until 10-12 years ago. The module
scheme had to be upgraded at the same time, but it took several goes to
get it right.
Now I wouldn't go back. Who cares about compiling a single module that
may or may not affect a bunch of others? Just compile the lot!
If a project's scale becomes too big, then it should be split into independent program units, for example a core EXE file and a bunch of
DLLs; that's the new granularity. Or a lot of functionality can be off-loaded to scripts, as I used to do.
(My scripting language code still needs bytecode compilation, and I also
use whole-program units there, but the bytecode compiler goes up to 2Mlps.)
On 19/11/2024 23:41, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
BTW I didn't remark on the range of your (WH's) figures. They spanned 40 >>> minutes for a build to instant, but it's not clear for which languages
they are, which tools are used and which machines. Or how much work they >>> have to do to get those faster times, or what work they don't do: I'm
guessing it's not processing 0.5M lines for that fastest time.
As I wrote, there are 2 different system, if interesed you can fetch
them from github.
Do you have a link? Probably I won't attempt to build but I can see what
it looks like.
[...]
All I have been arguing against is the idea of blindly putting in
validity tests for parameters in functions, as though it were a habit
that by itself leads to fewer bugs in code.
On 19/11/2024 23:41, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
I do not think I will use your system language.ÿ And for C compiler
at least currently it does not make big difference to me if your
compiler can do 1Mloc or 5Mloc on my machine, both are "pretty fast".
What matters more is support of debugging output, supporting
targets that I need (like ARM or Risc-V), good diagnostics
and optimization.
It's funny how nobody seems to care about the speed of compilers (which
can vary by 100:1), but for the generated programs, the 2:1 speedup you might get by optimising it is vital!
BTW why don't you use a cross-compiler? That's what David Brown would say.
On 15/11/2024 19:50, Waldek Hebisch wrote:
David Brown <david.brown@hesbynett.no> wrote:
On 11/11/2024 20:09, Waldek Hebisch wrote:
David Brown <david.brown@hesbynett.no> wrote:
Concerning correct place for checks: one could argue that check
should be close to place where the result of check matters, which
frequently is in called function.
No, there I disagree. The correct place for the checks should be close
to where the error is, and that is in the /calling/ code. If the called >>> function is correctly written, reviewed, tested, documented and
considered "finished", why would it be appropriate to add extra code to
that in order to test and debug some completely different part of the code? >>>
The place where the result of the check /really/ matters, is the calling >>> code. And that is also the place where you can most easily find the
error, since the error is in the calling code, not the called function.
And it is most likely to be the code that you are working on at the time >>> - the called function is already written and tested.
And frequently check requires
computation that is done by called function as part of normal
processing, but would be extra code in the caller.
It is more likely to be the opposite in practice.
And for much of the time, the called function has no real practical way
to check the parameters anyway. A function that takes a pointer
parameter - not an uncommon situation - generally has no way to check
the validity of the pointer. It can't check that the pointer actually
points to useful source data or an appropriate place to store data.
All it can do is check for a null pointer, which is usually a fairly
useless thing to do (unless the specifications for the function make the >>> pointer optional). After all, on most (but not all) systems you already >>> have a "free" null pointer check - if the caller code has screwed up and >>> passed a null pointer when it should not have done, the program will
quickly crash when the pointer is used for access. Many compilers
provide a way to annotate function declarations to say that a pointer
must not be null, and can then spot at least some such errors at compile >>> time. And of course the calling code will very often be passing the
address of an object in the call - since that can't be null, a check in
the function is pointless.
Well, in a sense pointers are easy: if you do not play nasty tricks
with casts then type checks do significant part of checking. Of
course, pointer may be uninitialized (but compiler warnings help a lot
here), memory may be overwritten, etc. But overwritten memory is
rather special, if you checked that content of memory is correct,
but it is overwritten after the check, then earlier check does not
help. Anyway, main point is ensuring that pointed to data satisfies
expected conditions.
That does not match reality. Pointers are far and away the biggest
source of errors in C code. Use after free, buffer overflows, mixups of
who "owns" the pointer - the scope for errors is boundless. You are
correct that type systems can catch many potential types of errors - unfortunately, people /do/ play nasty tricks with type checks.
Conversions of pointer types are found all over the place in C
programming, especially conversions back and forth with void* pointers.
All this means that invalid pointer parameters are very much a real
issue - but are typically impossible to check in the called function.
The way you avoid getting errors in your pointers is being careful about having the right data in the first place, so you only call functions
with valid parameters. You do this by having careful control about the ownership and lifetime of pointers, and what they point to, keeping conventions in the names of your pointers and functions to indicate who
owns what, and so on. And you use sanitizers and similar tools during testing and debugging to distinguish between tests that worked by luck,
and ones that worked reliably. (And of course you may consider other languages than C that help you express your requirements in a clearer
manner or with better automatic checking.)
Put the same effort and due diligence into the rest of your code, and suddenly you find your checks for other kinds of parameters in functions
are irrelevant as you are now making sure you call functions with appropriate valid inputs.
Once you get to more complex data structures, the possibility for the
caller to check the parameters gets steadily less realistic.
So now your practice of having functions "always" check their parameters >>> leaves the people writing calling code with a false sense of security -
usually you /don't/ check the parameters, you only ever do simple checks >>> that that called could (and should!) do if they were realistic. You've
got the maintenance and cognitive overload of extra source code for your >>> various "asserts" and other check, regardless of any run-time costs
(which are often irrelevant, but occasionally very important).
You will note that much of this - for both sides of the argument - uses
words like "often", "generally" or "frequently". It is important to
appreciate that programming spans a very wide range of situations, and I >>> don't want to be too categorical about things. I have already said
there are situations when parameter checking in called functions can
make sense. I've no doubt that for some people and some types of
coding, such cases are a lot more common than what I see in my coding.
Note also that when you can use tools to automate checks, such as
"sanitize" options in compilers or different languages that have more
in-built checks, the balance differs. You will generally pay a run-time >>> cost for those checks, but you don't have the same kind of source-level
costs - your code is still clean, clear, and amenable to correctness
checking, without hiding the functionality of the code in a mass of
unnecessary explicit checks. This is particularly good for debugging,
and the run-time costs might not be important. (But if run-time costs
are not important, there's a good chance that C is not the best language >>> to be using in the first place.)
Our experience differs. As a silly example consider a parser
which produces parse tree. Caller is supposed to pass syntactically
correct string as an argument. However, checking syntactic corretnetness
requires almost the same effort as producing parse tree, so it
ususal that parser both checks correctness and produces the result.
The trick here is to avoid producing a syntactically invalid string in
the first place. Solve the issue at the point where there is a mistake
in the code!
(If you are talking about a string that comes from outside the code in
some way, then of course you need to check it - and if that is most conveniently done during the rest of parsing, then that is fair enough.)
I have computations that are quite different than parsing but
in some cases share the same characteristic: checking correctness of
arguments requires complex computation similar to producing
actual result. More freqently, called routine can check various
invariants which with high probablity can detect errors. Doing
the same check in caller is inpractical.
I think you are misunderstanding me - maybe I have been unclear. I am saying that it is the /caller's/ responsibility to make sure that the parameters it passes are correct, not the /callee's/ responsibility.
That does not mean that the caller has to add checks to get the
parameters right - it means the caller has to use correct parameters.
Think of this like walking near a cliff-edge. Checking parameters
before the call is like having a barrier at the edge of the cliff. My recommendation is that you know where the cliff edge is, and don't walk there.
On 20/11/2024 02:33, Bart wrote:
It's funny how nobody seems to care about the speed of compilers
(which can vary by 100:1), but for the generated programs, the 2:1
speedup you might get by optimising it is vital!
To understand this, you need to understand the benefits of a program
running quickly.
ÿ Let's look at the main ones:
There is usually a point where a program is "fast enough" - going faster makes no difference.ÿ No one is ever going to care if a compilation
takes 1 second or 0.1 seconds, for example.
It doesn't take much thought to realise that for most developers, the
speed of their compiler is not actually a major concern in comparison to
the speed of other programs.
While writing code, and testing and debugging it, a given build might
only be run a few times, and compile speed is a bit more relevant. Generally, however, most programs are run far more often, and for far longer, than their compilation time.
And as usual, you miss out the fact that toy compilers - like yours, or TinyC - miss all the other features developers want from their tools.ÿ I want debugging information, static error checking, good diagnostics,
support for modern language versions (that's primarily C++ rather than
C), useful extensions, compact code, correct code generation, and most importantly of all, support for the target devices I want.
ÿ I wouldn't
care if your compiler can run at a billion lines per second and gcc took
an hour to compile - I still wouldn't be interested in your compiler
because it does not generate code for the devices I use.ÿ Even if it
did, it would be useless to me, because I can trust the code gcc
generates and I cannot trust the code your tool generates.
ÿ And even if
your tool did everything else I need, and you could convince me that it
is something a professional could rely on, I'd still use gcc for the
better quality generated code, because that translates to money saved
for my customers.
BTW why don't you use a cross-compiler? That's what David Brown would
say.
That is almost certainly what he normally does.ÿ It can still be fun to
play around with things like TinyC, even if it is of no practical use
for the real development.
Bart <bc@freeuk.com> wrote:
Either you're developing using interpreted code, or you must have some
means of converting source code to native code, but for some reason you
don't use 'compile' or 'build' to describe that process.
Or maybe your REPL/incremental process can run for days doing
incremental changes without doing a full compile.
Yes.
It seems quite mysterious.
There is nothing misterious here. In typed system each module has
a vector (one dimensional array) called domain vector containg amoung
other references to called function. All inter-module calls are
indirect ones, they take thing to call from the domain vector. When
module starts execution references point to a runtime routine doing
similar work to dynamic linker. The first call goes to runtime
support routine which finds needed code and replaces reference in
the domain vector.
When a module is recompiled references is domain vectors are
reinitialized to point to runtimne. So searches are run again
and if needed pick new routine.
Note that there is a global table keeping info (including types)
about all exported routines from all modules. This table is used
when compileing a module and also by the search process at runtime.
The effect is that after recompilation of a single module I have
runnuble executable in memory including code of the new module.
If you wonder about compiling the same module many times: system
has garbage collector and unused code is garbage collected.
So, when old version is replaced by new one the old becomes a
garbage and will be collected in due time.
The other system is similar in principle, but there is no need
for runtime search and domain vectors.
I might run my compiler hundreds of times a day (at 0.1 seconds a time,
600 builds would occupy one whole minute in the day!). I often do it for
frivolous purposes, such as trying to get some output lined up just
right. Or just to make sure something has been recompiled since it's so
quick it's hard to tell.
I know. But this is not what I do. Build produces mutiple
artifacts, some of them executable, some are loadable code (but _not_
in form recogized by operating system), some essentially non-executable
(like documentation).
So, 'build' means something different to you. I use 'build' just as a
change from writing 'compile'.
Build means creating new fully-functional system. That involves
possibly multiple compilations and whatever else is needed.
On 20/11/2024 16:15, David Brown wrote:
On 20/11/2024 02:33, Bart wrote:
It's funny how nobody seems to care about the speed of compilers
(which can vary by 100:1), but for the generated programs, the 2:1
speedup you might get by optimising it is vital!
To understand this, you need to understand the benefits of a program
running quickly.
As I said, people are preoccupied with that for programs in general. But when it comes to compilers, it doesn't apply! Clearly, you are implying
that those benefits don't matter when the program is a compiler.
ÿ Let's look at the main ones:
<snip>
OK. I guess you missed the bits here and in another post, where I
suggested that enabling optimisation is fine for production builds.
For the routines ones that I do 100s of times a day, where test runs are generally very short, then I don't want to hang about waiting for a
compiler that is taking 30 times longer than necessary for no good reason.
There is usually a point where a program is "fast enough" - going
faster makes no difference.ÿ No one is ever going to care if a
compilation takes 1 second or 0.1 seconds, for example.
If you look at all the interactions people have with technology, with
GUI apps, even with mechanical things, a 1 second latency is generally disastrous.
A one-second delay between pressing a key and seeing a character appear
on a display or any other feedback, would drive most people up to wall.
But 0.1 is perfectly fine.
It doesn't take much thought to realise that for most developers, the
speed of their compiler is not actually a major concern in comparison
to the speed of other programs.
Most developers are stuck with what there is. Naturally they will make
the best of it. Usually by finding 100 ways or 100 reasons to avoid
running the compiler.
While writing code, and testing and debugging it, a given build might
only be run a few times, and compile speed is a bit more relevant.
Generally, however, most programs are run far more often, and for far
longer, than their compilation time.
Developing code is the critical bit.
Even when a test run takes a bit longer as you need to set things up,
when you do need to change something and run it again, you don't want
any pointless delay.
Neither do you want to waste /your/ time pandering to a compiler's
slowness by writing makefiles and defining dependencies.
Or even
splitting things up into tiny modules.
I don't want to care about that
at all. Here's my bunch of source files, just build the damn thing, and
do it now!
On 20/11/2024 21:17, Bart wrote:
For the routines ones that I do 100s of times a day, where test runs
are generally very short, then I don't want to hang about waiting for
a compiler that is taking 30 times longer than necessary for no good
reason.
Your development process sounds bad in so many ways it is hard to know
where to start.ÿ I think perhaps the foundation is that you taught
yourself a bit of programming in the 1970's,
As I said, no one is ever going to care if a compilation takes 1 second
or 0.1 seconds.
As I said, no one is ever going to care if a compilation takes 1 second
or 0.1 seconds.
As I said, no one is ever going to care if a compilation takes 1 second
or 0.1 seconds.
So your advice is that developers should be stuck
Which do you think an employer (or amateur programmer) would prefer?
a) A compiler that runs in 0.1 seconds with little static checking
b) A compiler that runs in 10 seconds but spots errors saving 6 hours debugging time
I might spend an hour or two writing code (including planing,
organising, reading references, etc.) and then 5 seconds building it.
Then there might be anything from a few minutes to a few hours testing
or debugging.
But using a good compiler saves a substantial amount of developer time
<snip the rest to save time>
On 21/11/2024 13:00, David Brown wrote:
On 20/11/2024 21:17, Bart wrote:
For the routines ones that I do 100s of times a day, where test runs
are generally very short, then I don't want to hang about waiting for
a compiler that is taking 30 times longer than necessary for no good
reason.
Your development process sounds bad in so many ways it is hard to know
where to start. I think perhaps the foundation is that you taught
yourself a bit of programming in the 1970's,
1970s builds, especially on mainframes, were dominated by link times.
Bart <bc@freeuk.com> writes:
On 21/11/2024 13:00, David Brown wrote:
On 20/11/2024 21:17, Bart wrote:
For the routines ones that I do 100s of times a day, where test runs
are generally very short, then I don't want to hang about waiting for
a compiler that is taking 30 times longer than necessary for no good
reason.
Your development process sounds bad in so many ways it is hard to know
where to start.ÿ I think perhaps the foundation is that you taught
yourself a bit of programming in the 1970's,
1970s builds, especially on mainframes, were dominated by link times.
Which mainframe do you have experience on?
I spent a decade writing a mainframe operating system (the largest application we had to compile regularly) and the link time was a
minor fraction of the overall build time.
It was so minor that our build system stored the object files
so that the OS engineers only needed to recompile the object
associated with the source file being modified rather than
the entire OS, they'd share the rest of the object files
with the entire OS team.
On 21/11/2024 15:50, Scott Lurndal wrote:
Bart <bc@freeuk.com> writes:
On 21/11/2024 13:00, David Brown wrote:
On 20/11/2024 21:17, Bart wrote:
For the routines ones that I do 100s of times a day, where test runs >>>>> are generally very short, then I don't want to hang about waiting for >>>>> a compiler that is taking 30 times longer than necessary for no good >>>>> reason.
Your development process sounds bad in so many ways it is hard to know >>>> where to start. I think perhaps the foundation is that you taught
yourself a bit of programming in the 1970's,
1970s builds, especially on mainframes, were dominated by link times.
Which mainframe do you have experience on?
I spent a decade writing a mainframe operating system (the largest
application we had to compile regularly) and the link time was a
minor fraction of the overall build time.
It was so minor that our build system stored the object files
so that the OS engineers only needed to recompile the object
associated with the source file being modified rather than
the entire OS, they'd share the rest of the object files
with the entire OS team.
The one I remember most was 'TKB' I think it was, running on ICL 4/72
(360 clone). It took up most of the memory. It was used to link my small >Fortran programs.
Bart <bc@freeuk.com> writes:
On 21/11/2024 15:50, Scott Lurndal wrote:
Bart <bc@freeuk.com> writes:
On 21/11/2024 13:00, David Brown wrote:
On 20/11/2024 21:17, Bart wrote:
For the routines ones that I do 100s of times a day, where test runs >>>>>> are generally very short, then I don't want to hang about waiting for >>>>>> a compiler that is taking 30 times longer than necessary for no good >>>>>> reason.
Your development process sounds bad in so many ways it is hard to know >>>>> where to start.ÿ I think perhaps the foundation is that you taught
yourself a bit of programming in the 1970's,
1970s builds, especially on mainframes, were dominated by link times.
Which mainframe do you have experience on?
I spent a decade writing a mainframe operating system (the largest
application we had to compile regularly) and the link time was a
minor fraction of the overall build time.
It was so minor that our build system stored the object files
so that the OS engineers only needed to recompile the object
associated with the source file being modified rather than
the entire OS, they'd share the rest of the object files
with the entire OS team.
The one I remember most was 'TKB' I think it was, running on ICL 4/72
(360 clone). It took up most of the memory. It was used to link my small
Fortran programs.
So you generalize from your one non-standard experience to the entire ecosystem.
Typical Bart.
On 21/11/2024 16:10, Scott Lurndal wrote:
Bart <bc@freeuk.com> writes:
On 21/11/2024 15:50, Scott Lurndal wrote:
Bart <bc@freeuk.com> writes:
On 21/11/2024 13:00, David Brown wrote:Which mainframe do you have experience on?
On 20/11/2024 21:17, Bart wrote:
For the routines ones that I do 100s of times a day, where test runs >>>>>>> are generally very short, then I don't want to hang about waiting for >>>>>>> a compiler that is taking 30 times longer than necessary for no good >>>>>>> reason.
Your development process sounds bad in so many ways it is hard to know >>>>>> where to start. I think perhaps the foundation is that you taught >>>>>> yourself a bit of programming in the 1970's,
1970s builds, especially on mainframes, were dominated by link times. >>>>
I spent a decade writing a mainframe operating system (the largest
application we had to compile regularly) and the link time was a
minor fraction of the overall build time.
It was so minor that our build system stored the object files
so that the OS engineers only needed to recompile the object
associated with the source file being modified rather than
the entire OS, they'd share the rest of the object files
with the entire OS team.
The one I remember most was 'TKB' I think it was, running on ICL 4/72
(360 clone). It took up most of the memory. It was used to link my small >>> Fortran programs.
So you generalize from your one non-standard experience to the entire ecosystem.
Typical Bart.
Typical Scott. Did you post just to do a bit of bart-bashing?
Have you also considered that your experience of building operating
systems might itself be non-standard?
Bart <bc@freeuk.com> wrote:
...or to just always require 'else', with a dummy value if necessary?
Well, frequently it is easier to do bad job, than a good one.
I assume that you consider the simple solution the 'bad' one?
You wrote about _always_ requiring 'else' regardless if it is
needed or not. Yes, I consider this bad.
On 20/11/2024 21:17, Bart wrote:
Your development process sounds bad in so many ways it is hard to know
where to start.ÿ I think perhaps the foundation is that you taught
yourself a bit of programming in the 1970's,
And presumably you also advise doing so on a bargain basement
single-core computer from at least 15 years ago?
Sure. That's when you run a production build. I can even do that myself
on some programs (the ones where my C transpiler still works) and pass
it through gcc-O3. Then it might run 30% faster.
int main(void) {
int a;
int* p = 0;
a = *p;
}
Here's what happens with my C compiler when told to interpret it:
c:\cx>cc -i c
Compiling c.c to c.(int)
Error: Null ptr access
Here's what happens with gcc:
c:\cx>gcc c.c
c:\cx>a
<crashes>
Is there some option to insert such a check with gcc? I've no idea; most people don't.
Bart <bc@freeuk.com> wrote:
int main(void) {
int a;
int* p = 0;
a = *p;
}
Here's what happens with my C compiler when told to interpret it:
c:\cx>cc -i c
Compiling c.c to c.(int)
Error: Null ptr access
Here's what happens with gcc:
c:\cx>gcc c.c
c:\cx>a
<crashes>
Is there some option to insert such a check with gcc? I've no idea; most
people don't.
I would do
gcc -g c.c
gdb a.out
run
and gdb would show me place with bad access. Things like bound
checking array access or overflow checking makes a big difference.
Null pointer access is reliably detected by hardware so no big
deal. Say what you 'cc' will do with the following function:
int
foo(int n) {
int a[10];
int i;
int res = 0;
for(i = 0; i <= 10; i++) {
a[i] = n + i;
}
for(i = 0; i <= 10; i++) {
res += a[i];
}
res;
}
Here gcc at compile time says:
foo.c: In function ‘foo’:
foo.c:15:17: warning: iteration 10 invokes undefined behavior [-Waggressive-loop-optimizations]
15 | res += a[i];
| ~^~~
foo.c:14:18: note: within this loop
14 | for(i = 0; i <= 10; i++) {
| ~~^~~~~
Bart <bc@freeuk.com> wrote:
Sure. That's when you run a production build. I can even do that
myself on some programs (the ones where my C transpiler still
works) and pass it through gcc-O3. Then it might run 30% faster.
On fast machine running Dhrystone 2.2a I get:
tcc-0.9.28rc 20000000
gcc-12.2 -O 64184852
gcc-12.2 -O2 83194672
clang-14 -O 83194672
clang-14 -O2 85763288
so with 02 this is more than 4 times faster. Dhrystone correlated
resonably with runtime of tight compute-intensive programs.
Compiler started to cheat on original Dhrystone, so there are
bigger benchmarks like SPEC INT. But Dhrystone 2 has modifications
to make cheating harder, so I think it is still reasonable
benchmark. Actually, difference may be much bigger, for example
in image processing both clang and gcc can use vector intructions,
with may give additional speedup of order 8-16.
30% above means that you are much better than tcc or your program
is badly behaving (I have programs that make intensive use of
memory, here effect of optimization would be smaller, but still
of order 2).
Bart <bc@freeuk.com> wrote:
Sure. That's when you run a production build. I can even do that myself
on some programs (the ones where my C transpiler still works) and pass
it through gcc-O3. Then it might run 30% faster.
On fast machine running Dhrystone 2.2a I get:
tcc-0.9.28rc 20000000
gcc-12.2 -O 64184852
gcc-12.2 -O2 83194672
clang-14 -O 83194672
clang-14 -O2 85763288
so with 02 this is more than 4 times faster. Dhrystone correlated
resonably with runtime of tight compute-intensive programs.
Compiler started to cheat on original Dhrystone, so there are
bigger benchmarks like SPEC INT. But Dhrystone 2 has modifications
to make cheating harder, so I think it is still reasonable
benchmark. Actually, difference may be much bigger, for example
in image processing both clang and gcc can use vector intructions,
with may give additional speedup of order 8-16.
30% above means that you are much better than tcc or your program
is badly behaving (I have programs that make intensive use of
memory, here effect of optimization would be smaller, but still
of order 2).
On Fri, 22 Nov 2024 12:33:29 -0000 (UTC)
antispam@fricas.org (Waldek Hebisch) wrote:
Bart <bc@freeuk.com> wrote:
Sure. That's when you run a production build. I can even do that
myself on some programs (the ones where my C transpiler still
works) and pass it through gcc-O3. Then it might run 30% faster.
On fast machine running Dhrystone 2.2a I get:
tcc-0.9.28rc 20000000
gcc-12.2 -O 64184852
gcc-12.2 -O2 83194672
clang-14 -O 83194672
clang-14 -O2 85763288
so with 02 this is more than 4 times faster. Dhrystone correlated
resonably with runtime of tight compute-intensive programs.
Compiler started to cheat on original Dhrystone, so there are
bigger benchmarks like SPEC INT. But Dhrystone 2 has modifications
to make cheating harder, so I think it is still reasonable
benchmark. Actually, difference may be much bigger, for example
in image processing both clang and gcc can use vector intructions,
with may give additional speedup of order 8-16.
30% above means that you are much better than tcc or your program
is badly behaving (I have programs that make intensive use of
memory, here effect of optimization would be smaller, but still
of order 2).
gcc -O is not what Bart was talking about. It is quite similar to -O1.
Try gcc -O0.
With regard to speedup, I had run only one or two benchmarks with tcc
and my results were close to those of Bart. gcc -O0 very similar to tcc
in speed of the exe, but compiles several times slower. gcc -O2 exe
about 2.5 times faster.
I'd guess, I can construct a case, where gcc successfully vectorized
some floating-point loop calculation and showed 10x speed up vs tcc on
modern Zen5 hardware. But that's would not be typical.
On 21/11/2024 13:00, David Brown wrote:
On 20/11/2024 21:17, Bart wrote:
Your development process sounds bad in so many ways it is hard to know
where to start.ÿ I think perhaps the foundation is that you taught
yourself a bit of programming in the 1970's,
I did a CS degree actually. I also spent a year programming, working for
the ARC and SRC (UK research councils).
But since you are being so condescending, I think /your/ problem is in having to use C. I briefly mentioned that a 'better language' can help.
While I don't claim that my language is particularly safe, mine is
somewhat safer than C in its type system, and far less error prone in
its syntax and its overall design (for example, a function's details are always defined in exactly one place, so less maintenance and fewer
things to get wrong).
So, half the options in your C compilers are to help get around those shortcomings.
You also seem proud that in this example:
ÿ int F(int n) {
ÿÿÿÿÿ if (n==1) return 10;
ÿÿÿÿÿ if (n==2) return 20;
ÿ }
You can use 'unreachable()', a new C feature, to silence compiler
messages about running into the end of the function, something I
considered a complete hack.
My language requires a valid return value from the last statement. In
that it's similar to the Rust example I posted 9 hours ago.
Yet the gaslighting here suggested what I chose to do was completely wrong.
And presumably you also advise doing so on a bargain basement
single-core computer from at least 15 years ago?
Another example of you acknowledging that compilation speed can be a problem. So a brute force approach to speed is what counts for you.
If you found that it took several hours to drive 20 miles from A to B,
your answer would be to buy a car that goes at 300mph, rather than doing endless detours along the way.
Or another option is to think about each journey extremely carefully,
and then only do the trip once a week!
You also seem proud that in this example:
int F(int n) {
if (n==1) return 10;
if (n==2) return 20;
}
You can use 'unreachable()', a new C feature, to silence compiler
messages about running into the end of the function, something I
considered a complete hack.
On 22/11/2024 12:33, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
Sure. That's when you run a production build. I can even do that myself
on some programs (the ones where my C transpiler still works) and pass
it through gcc-O3. Then it might run 30% faster.
On fast machine running Dhrystone 2.2a I get:
tcc-0.9.28rc 20000000
gcc-12.2 -O 64184852
gcc-12.2 -O2 83194672
clang-14 -O 83194672
clang-14 -O2 85763288
so with 02 this is more than 4 times faster. Dhrystone correlated
resonably with runtime of tight compute-intensive programs.
Compiler started to cheat on original Dhrystone, so there are
bigger benchmarks like SPEC INT. But Dhrystone 2 has modifications
to make cheating harder, so I think it is still reasonable
benchmark. Actually, difference may be much bigger, for example
in image processing both clang and gcc can use vector intructions,
with may give additional speedup of order 8-16.
30% above means that you are much better than tcc or your program
is badly behaving (I have programs that make intensive use of
memory, here effect of optimization would be smaller, but still
of order 2).
The 30% applies to my typical programs, not benchmarks. Sure, gcc -O3
can do a lot of aggressive optimisations when everything is contained
within one short module and most runtime is spent in clear bottlenecks.
Real apps, like say my compilers, are different. They tend to use
globals more, program flow is more disseminated. The bottlenecks are
harder to pin down.
But, OK, here's the first sizeable benchmark that I thought of (I can't
find a reliable Dhrystone one; perhaps you can post a link).
Bart <bc@freeuk.com> wrote:
On 22/11/2024 12:33, Waldek Hebisch wrote:
But, OK, here's the first sizeable benchmark that I thought of (I can't
find a reliable Dhrystone one; perhaps you can post a link).
First Google hit for Dhrystone 2.2a
https://homepages.cwi.nl/~steven/dry.chttps://homepages.cwi.nl/~steven/dry.c
(I used this one).
- most of code is portable, but for timing we need timer with
sufficient resolution, so I use Unix 'gettimeofday'.
On 2024-11-22, Bart <bc@freeuk.com> wrote:
You also seem proud that in this example:
int F(int n) {
if (n==1) return 10;
if (n==2) return 20;
}
You can use 'unreachable()', a new C feature, to silence compiler
messages about running into the end of the function, something I
considered a complete hack.
Unreachable assertions are actually a bad trade if all you are looking
for is to suppress a diagnostic. Because the behavior is undefined
if the unreachable is actually reached.
That's literally the semantic definition! "unreachable()" means,
roughly, "remove all definition of behavior from this spot in the
program".
Whereas falling off the end of an int-returning function only
becomes undefined if the caller obtains the return value,
and of course in the case of a void function, it's well-defined.
You are better off with:
assert(0 && "should not be reached");
return 0;
if asserts are turned off with NDEBUG, the function does something that
is locally safe, and offers the possibility of avoiding a disaster.
The only valid reason for using unreachable is optimization: you're introducing something unsafe in order to get better machine code. When
the compiler is informed that the behavior is always undefined when some
code is reached, it can just delete that code and everything dominated
by it (reachable only through it).
The above function does not need a function return sequence to be
emitted for the fall-through case that is not expected to occur,
if the situation truly does not occur. Then if it does occur, hell
will break loose since control will fall through to whatever bytes
follow the abrupt end of the function.
Bart <bc@freeuk.com> wrote:
clang -O3 -march=native 126112us
clang -O3 222136us
clang -O 225855us
gcc -O3 -march=native 82809us
gcc -O3 114365us
gcc -O 287786us
tcc 757347us
There is some irregularity in timings, but this shows that
factor of order 9 is possible.
On 22/11/2024 19:29, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
On 22/11/2024 12:33, Waldek Hebisch wrote:
But, OK, here's the first sizeable benchmark that I thought of (I can't
find a reliable Dhrystone one; perhaps you can post a link).
First Google hit for Dhrystone 2.2a
https://homepages.cwi.nl/~steven/dry.chttps://homepages.cwi.nl/~steven/dry.c >>
(I used this one).
There was no shortage of them, there were just too many. All seemed to
need some Linux script to compile them, and all needed Linux anyway
because only that has sys/times.h.
I eventually find one for Windows, and that goes to the other extreme
and needs CL (MSVC) with these options:
cl /O2 /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /MD /W4 /Wp64 /Zi
/TP /EHsc /Fa /c dhry264.c dhry_264.c
Plus it uses various ASM routines written MASM syntax. I was partway
through getting it to work with my compiler, when I saw your post.
Your version is much simpler to get going, but still not straightforward because of 'gettimeofday', which is available via gcc, but is not
exported by msvcrt, which is what tcc and my product use.
I changed it to use clock().
The results then are like this (I tried two sizes of matrix element):
uint32_t uint64_t
gcc -O0 2165 2180 msec
gcc -O3 282 470
tcc 2572 2509
cc 2165 2243
mcc -opt 720 720
The mcc product keeps some local variables in registers, a minor optimisation I will apply to cc in due course. It's not a priority,
since usually it makes little difference on real applications. Only on benchmarks like this.
gcc -O3 seems to enable some SIMD instructions, but only for u32. With
u64 elements, then gcc -O3 is only about 50% faster than my compiler.
If I try -march=native, then the 282 sometimes gets down to 235, and the
470 to 420.
(When functions like this were needed in my programs during 80s and 90s,
I used inline assembly. Most code wasn't that critical.)
- most of code is portable, but for timing we need timer with
sufficient resolution, so I use Unix 'gettimeofday'.
Why? Just make the task take long enough.
BTW I also ported your program to my 'M' language. The timing however
was about the same as mcc-opt.
The source is below if interested.
Bart <bc@freeuk.com> wrote:
FYI, ATM is have a version compiling via Lisp, with bounds checking
on it takes 0.58s, with bounds checking off it takes 0.43s
on my machine. The reason to look at C version is to do better.
Taken together, your and my timing indicate that your 'cc' will
give me less speed than going via Lisp. 'mcc -opt' pobably would
give an impovement, but not compared to 'gcc'. BTW, below times
on a slower machine (5 years old cheap laptop):
gcc -O3 -march=native 1722910us
gcc -O3 1720884us
gcc -O 1642328us
tcc 7661992us
via Lisp, checking 5.29s
via Lisp, no checking 4.27s
With -O3 gcc vectorizes inner loops, but apparently on this machine
it backfires and execution time is longer than without vectorization.
In both cases 'tcc' gives slower code than going via Lisp with
array bounds checking on, so ATM using 'tcc' for this application
is rather unattractive.
I may end up using inline assembly, but this is a mess: code for
fast machine will not run on older ones, on some machines
non-vectorized code is faster. So I would need mutiple versions
of assembler just to cover x86_64. And I have other targets.
And this is just one of critical routines. I have probably about
10 such critical routines now and it may grow to about 50.
To get good speed I am experimeting with various variants.
So going assembler way I could be forced to write several
thousends of lines of optimized assembler (most of that to
throw out, but before writing them I would not know which
ones are the best). That would be much more work than just
passing various options to 'gcc' and 'clang' and measuring
execution time.
- most of code is portable, but for timing we need timer with
sufficient resolution, so I use Unix 'gettimeofday'.
Why? Just make the task take long enough.
Well, Windows 'clock' looks OK, but some old style timing routines
have really low resolution and would lead to excessive run
time (I need to run rather large number of tests).
BTW I also ported your program to my 'M' language. The timing however
was about the same as mcc-opt.
The source is below if interested.
AFAICS you have assign-op combinations like 'min:='. ATM I am
undecided about similar operations. I mean, in a language which
like C applies operator only to base types they give some gain.
But I want operators working on large variety of types, and then
it is not clear how to define them.
On 22/11/2024 19:29, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
clang -O3 -march=native 126112us
clang -O3 222136us
clang -O 225855us
gcc -O3 -march=native 82809us
gcc -O3 114365us
gcc -O 287786us
tcc 757347us
You've omitted -O0 for gcc and clang. That timing probably won't be too
far from tcc, but compilation time for larger programs will be
significantly longer (eg. 10 times or more).
The trade-off then is not worth it unless you are running gcc for other reasons (eg. for deeper analysis, or to compile less portable code that
has only been tested on or written for gcc/clang; or just an irrational hatred of simple tools).
There is some irregularity in timings, but this shows that
factor of order 9 is possible.
That's an extreme case, for one small program with one obvious
bottleneck where it spends 99% of its time, and with little use of
memory either.
For simply written programs, the difference is more like 2:1. For more complicated C code that makes much use of macros that can expand to lots
of nested function calls, it might be 4:1, since it might rely on optimisation to inline some of those calls.
Again, that would be code written to take advantage of specific compilers.
But that is still computationally intensive code working on small
amounts of memory.
I have a text editor written in my scripting language. I can translate
its interpreter to C and compile with both gcc-O3 and tcc.
Then, yes, you will notice twice as much latency with the tcc
interpreter compared with gcc-O3, when doing things like
deleting/inserting lines at the beginning of a 1000000-line text file.
But typically, the text files will be 1000 times smaller; you will
notice no difference at all.
I'm not saying no optimisation is needed, ever, I'm saying that the NEED
for optimisation is far smaller than most people seem to think.
Here are some timings for that interpreter, when used to run a script to compute fib(38) the long way:
Interp Built with Timing
qc tcc 9.0 secs (qc is C transpiled version)
qq mm 5.0 (-fn; qq is original M version)
qc gcc-O3 4.0
qq mm 1.2 (-asm)
(My interpreter doesn't bother with faster switch-based or computed-goto based dispatchers. The choice is between a slower function-table-based
one, and an accelerated threaded-code version using inline ASM.
These are selected with -fn/-asm options. The -asm version is not JIT;
it is still interpreting a bytecode at a time).
So the fastest version here doesn't use compiler optimisation, and it's
3 times the speed of gcc-O3. My unoptimised HLL code is also only 25%
slower than gcc-O3.
Bart <bc@freeuk.com> wrote:
I'm not saying no optimisation is needed, ever, I'm saying that the NEED
for optimisation is far smaller than most people seem to think.
There is also question of disc space. 'tcc' compiled by itself is
404733 bytes (code + data) (0.024s compile time), by gcc (default) is
340950 (0.601s compile time), by gcc -O is 271229 (1.662s compile
time), by gcc -Os is 228855 (2.470s compile time), by gcc -O2
is 323392 (3.364s compile time), gcc -O3 is 407952 (4.627s compile
time). As you can see gcc -Os can save quite a bit of disc space
for still moderate compile time.
And of course, there is a question why program with runtime that
does not matter is written in a low level language?
So the fastest version here doesn't use compiler optimisation, and it's
3 times the speed of gcc-O3. My unoptimised HLL code is also only 25%
slower than gcc-O3.
Well, most folks would "not bother" with inline ASM and instead use
fastest wersion that C can give. Which likely would involve
gcc -O2 or gcc -O3.
On 24/11/2024 00:24, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
And of course, there is a question why program with runtime that
does not matter is written in a low level language?
I mean it doesn't matter if it's half the speed. It might matter if it
was 40 times slower.
There's quite a gulf between even unoptimised native code and even a
fast dynamic language interpreter.
People seem to think that the only choices are the fastest possible C
code at one end, and slow CPython at the other:
ÿ gcc/O3-tcc-----------------------------------------------------CPython
On this scale, gcc/O3 code and tcc code are practically the same!
On 24/11/2024 00:24, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
I'm not saying no optimisation is needed, ever, I'm saying that the NEED >>> for optimisation is far smaller than most people seem to think.
There is also question of disc space. 'tcc' compiled by itself is
404733 bytes (code + data) (0.024s compile time), by gcc (default) is
340950 (0.601s compile time), by gcc -O is 271229 (1.662s compile
time), by gcc -Os is 228855 (2.470s compile time), by gcc -O2
is 323392 (3.364s compile time), gcc -O3 is 407952 (4.627s compile
time). As you can see gcc -Os can save quite a bit of disc space
for still moderate compile time.
I thought David Brown said that disk space is irrelevant?
Anyway this is
the exact copy of what I tried just now, compiling a 5-line hello.c
program. I hadn't used these compilers since earlier today:
c:\c>tm gcc hello.c
TM: 5.80
c:\c>tm tcc hello.c
TM: 0.19
c:\c>tm gcc hello.c
TM: 0.19
c:\c>tm tcc hello.c
TM: 0.03
From cold, gcc took nearly 6 seconds (if you've been used to instant feedback all day, it can feel like an age). tcc took 0.2 seconds.
Doing it a second time, now gcc takes 0.2 seconds, and tcc takes 0.03 seconds! (It can't get much faster on Windows.)
gcc is just a lumbering giant, a 870MB installation, while tcc is 2.5MB.
As for sizes:
c:\c>dir hello.exe
24/11/2024 00:44 2,048 hello.exe
c:\c>dir a.exe
24/11/2024 00:44 91,635 a.exe (48K with -s)
(At least that's one good thing of gcc writing out that weird a.exe each time; I can compare both exes!)
As for mine (however it's possible I used it more recently):
c:\c>tm cc hello
Compiling hello.c to hello.exe
TM: 0.04
c:\c>dir hello.exe
24/11/2024 00:52 2,560 hello.exe
My installation is 0.3MB (excluding windows.h which is 0.6MB). Being self-contained, I can trivally apply UPX compression to get a 0.1MB compiler, which can be easily copied to a memory stick or bundled in one
of my apps. However compiling hello.c now takes 0.05 seconds.
(I don't use UPX because my apps are already tiny; it's just to marvel
at how much redundancy they still contain, and how much tinier they
could be.)
I know none of this will cut any ice; for various reasons you don't want
to use tcc.
One of them being that your build process involves N slow stages so
speeding up just one makes little difference.
This however is very similar to my argument about optimisation; a
running app consists of lots of parts which take up execution time, not
all of which can be speeded up by a factor of 9. The net benefit will be
a lot less, just like your reduced build time.
And of course, there is a question why program with runtime that
does not matter is written in a low level language?
I mean it doesn't matter if it's half the speed. It might matter if it
was 40 times slower.
There's quite a gulf between even unoptimised native code and even a
fast dynamic language interpreter.
People seem to think that the only choices are the fastest possible C
code at one end, and slow CPython at the other:
gcc/O3-tcc-----------------------------------------------------CPython
On this scale, gcc/O3 code and tcc code are practically the same!
On 22/11/2024 12:51, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
int main(void) {
int a;
int* p = 0;
a = *p;
}
Here's what happens with my C compiler when told to interpret it:
c:\cx>cc -i c
Compiling c.c to c.(int)
Error: Null ptr access
Here's what happens with gcc:
c:\cx>gcc c.c
c:\cx>a
<crashes>
Is there some option to insert such a check with gcc? I've no idea; most >>> people don't.
I would do
gcc -g c.c
gdb a.out
run
and gdb would show me place with bad access. Things like bound
checking array access or overflow checking makes a big difference.
Null pointer access is reliably detected by hardware so no big
deal. Say what you 'cc' will do with the following function:
int
foo(int n) {
int a[10];
int i;
int res = 0;
for(i = 0; i <= 10; i++) {
a[i] = n + i;
}
for(i = 0; i <= 10; i++) {
res += a[i];
}
res;
}
Here gcc at compile time says:
foo.c: In function ‘foo’:
foo.c:15:17: warning: iteration 10 invokes undefined behavior [-Waggressive-loop-optimizations]
15 | res += a[i];
| ~^~~
foo.c:14:18: note: within this loop
14 | for(i = 0; i <= 10; i++) {
| ~~^~~~~
My 'cc -i' wouldn't detect it. The -i tells it to run an interpreter on
the intermediate code. Within the interpreter, some things are easily checked, but bounds info on arrays doesn't exist. (The IL supports only pointer operations, not high level array ops.)
That would need intervention at an earlier stage, but even then, the
design of C makes that difficult. First, because array types like
int[10] decay to simple pointers, and ones represented by types like
int* don't have bounds info at all. (I don't support int[n] params and
few people use them anyway.)
On 24/11/2024 01:36, Bart wrote:
On 24/11/2024 00:24, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
And of course, there is a question why program with runtime that
does not matter is written in a low level language?
I mean it doesn't matter if it's half the speed. It might matter if it
was 40 times slower.
There's quite a gulf between even unoptimised native code and even a
fast dynamic language interpreter.
People seem to think that the only choices are the fastest possible C
code at one end, and slow CPython at the other:
ÿ gcc/O3-tcc-----------------------------------------------------CPython
On this scale, gcc/O3 code and tcc code are practically the same!
(I wasn't able to post results earlier because CPython hadn't finished.
But for a JPEG decoder test on an 85Mpixel image, all using the same algorithm:
gcc-O3 2.2 seconds
mm6-opt 3.3 seconds (My older compiler with the register optim.)
mm7 5.7 seconds (My unoptimising new one)
cc 6.0 seconds (Unoptimising)
tcc 8.1 seconds
PyPy 43 seconds (Uses JIT to optimise hot loops to native code)
CPython 386 seconds)
On 23/11/2024 16:45, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
FYI, ATM is have a version compiling via Lisp, with bounds checking
on it takes 0.58s, with bounds checking off it takes 0.43s
on my machine. The reason to look at C version is to do better.
Taken together, your and my timing indicate that your 'cc' will
give me less speed than going via Lisp. 'mcc -opt' pobably would
give an impovement, but not compared to 'gcc'. BTW, below times
on a slower machine (5 years old cheap laptop):
gcc -O3 -march=native 1722910us
gcc -O3 1720884us
gcc -O 1642328us
tcc 7661992us
via Lisp, checking 5.29s
via Lisp, no checking 4.27s
With -O3 gcc vectorizes inner loops, but apparently on this machine
it backfires and execution time is longer than without vectorization.
In both cases 'tcc' gives slower code than going via Lisp with
array bounds checking on, so ATM using 'tcc' for this application
is rather unattractive.
Lisp is a rather mysterious language which can apparently be and do anything: it can be interpreted or compiled.
Statically typed or
dynamic.
Imperative or functional.
It can also apparently be implemented in a few dozen lines or code.
- most of code is portable, but for timing we need timer with
sufficient resolution, so I use Unix 'gettimeofday'.
Why? Just make the task take long enough.
Well, Windows 'clock' looks OK, but some old style timing routines
have really low resolution and would lead to excessive run
time (I need to run rather large number of tests).
I've tried all sorts, from Windows' high performance routines, down to
x64's RDTSC instruction. They all gave unreliable, variable results. Now
I just use 'clock', but might turn off all other apps for extra conistency.
Bart <bc@freeuk.com> wrote:
gcc is just a lumbering giant, a 870MB installation, while tcc is 2.5MB.
Yes, but exact size depends which version you install and how you
install it. I installed version 6.5 and removed debugging info from executables. The result is 177MB, large but significantly smaller
than what you have.
As for sizes:
c:\c>dir hello.exe
24/11/2024 00:44 2,048 hello.exe
c:\c>dir a.exe
24/11/2024 00:44 91,635 a.exe (48K with -s)
(At least that's one good thing of gcc writing out that weird a.exe each
time; I can compare both exes!)
AFAICS this is one-time Windows overhead + default layout rules for
the linker. On Linux I get 15952 bytes by defauls, 14472 after
striping. However, the actual code + data size is 1904 and even
in this most is crap needed to support extra features of C library.
In other words, this is mostly irrelevant, as people who want to
get size down can link it with different options to get smaller
size down. Actual hello world code size is 99 bytes when compiled
by gcc (default options) and 64 bytes by tcc.
I know none of this will cut any ice; for various reasons you don't want
to use tcc.
Well, I tried to use tcc when it first appeared.
There is question of trust: when what I reported remained unfixed
I lost faith in quality of tcc. I still need to check if it is
fixed now, but at least now tcc seem to have some developement.
One of them being that your build process involves N slow stages so
speeding up just one makes little difference.
Yes.
This however is very similar to my argument about optimisation; a
running app consists of lots of parts which take up execution time, not
all of which can be speeded up by a factor of 9. The net benefit will be
a lot less, just like your reduced build time.
If I do not have good reasons to write program in C, then likely I
will write it in some higher-level language. One good reason
to use C is to code performance-critical routines.
On 24/11/2024 05:03, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
As for sizes:
c:\c>dir hello.exe
24/11/2024 00:44 2,048 hello.exe
c:\c>dir a.exe
24/11/2024 00:44 91,635 a.exe (48K with -s)
(At least that's one good thing of gcc writing out that weird a.exe each >>> time; I can compare both exes!)
AFAICS this is one-time Windows overhead + default layout rules for
the linker. On Linux I get 15952 bytes by defauls, 14472 after
striping. However, the actual code + data size is 1904 and even
in this most is crap needed to support extra features of C library.
In other words, this is mostly irrelevant, as people who want to
get size down can link it with different options to get smaller
size down. Actual hello world code size is 99 bytes when compiled
by gcc (default options) and 64 bytes by tcc.
I get a size of 3KB for tcc compiling hello.c under WSL.
On Windows, my cc compiler has the option of generating my private
binary format called 'MX':
c:\c>cc -mx hello
Compiling hello.c to hello.mx
c:\c>dir hello.mx
24/11/2024 11:58 194 hello.mx
Then the size is 194 bytes (most of that is a big header and list of
default DLL files to import). However that requires a one-off launcher
(12KB compiled as C) to run it:
c:\c>runmx hello
Hello, World!
(In practice, MX files are bigger than equivalent EXEs since they
contain more reloc info. I developed the format before I had options for PIC/relocatable code, which is necessary for OBJ/DLL formats.)
If I do not have good reasons to write program in C, then likely I
will write it in some higher-level language. One good reason
to use C is to code performance-critical routines.
It can also do manipulations that are harder in a 'softer', safer HLL.
(My scripting language however can still do most of those underhand things.)
Bart <bc@freeuk.com> wrote:
On 24/11/2024 05:03, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
As for sizes:
c:\c>dir hello.exe
24/11/2024 00:44 2,048 hello.exe
c:\c>dir a.exe
24/11/2024 00:44 91,635 a.exe (48K with -s)
(At least that's one good thing of gcc writing out that weird a.exe each >>>> time; I can compare both exes!)
AFAICS this is one-time Windows overhead + default layout rules for
the linker. On Linux I get 15952 bytes by defauls, 14472 after
striping. However, the actual code + data size is 1904 and even
in this most is crap needed to support extra features of C library.
In other words, this is mostly irrelevant, as people who want to
get size down can link it with different options to get smaller
size down. Actual hello world code size is 99 bytes when compiled
by gcc (default options) and 64 bytes by tcc.
I get a size of 3KB for tcc compiling hello.c under WSL.
That more or less agrees with file size that I reported. I
prefer to look at what 'size' reports and at looking at .o
files,
It can also do manipulations that are harder in a 'softer', safer HLL.
(My scripting language however can still do most of those underhand things.)
Anything computational can be done in a HLL. You may wish to
play tricks to save time. Or possible some packing tricks to
save memory. But packing tricks can be done in HLL (say by
treating whole memory as a big array of u64), so this really
boils down to speed.
On 24/11/2024 15:00, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
On 24/11/2024 05:03, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
As for sizes:
c:\c>dir hello.exe
24/11/2024 00:44 2,048 hello.exe
c:\c>dir a.exe
24/11/2024 00:44 91,635 a.exe (48K with -s)
(At least that's one good thing of gcc writing out that weird a.exe each >>>>> time; I can compare both exes!)
AFAICS this is one-time Windows overhead + default layout rules for
the linker. On Linux I get 15952 bytes by defauls, 14472 after
striping. However, the actual code + data size is 1904 and even
in this most is crap needed to support extra features of C library.
In other words, this is mostly irrelevant, as people who want to
get size down can link it with different options to get smaller
size down. Actual hello world code size is 99 bytes when compiled
by gcc (default options) and 64 bytes by tcc.
I get a size of 3KB for tcc compiling hello.c under WSL.
That more or less agrees with file size that I reported. I
prefer to look at what 'size' reports and at looking at .o
files,
Oh, I thought you were reporting sizes of 99 and 64 bytes, in response
to tcc's 2048 bytes.
So I'm not sure what you mean by 'actual' size, unless it is the same as this reported by my product here (comments added):
c:\cx>cc -v hello
Compiling hello.c to hello.exe
Code size: 34 bytes # .text
Idata size: 15 # .data
Code+Idata: 49
Zdata size: 0 # .bss
EXE size: 2,560
So at 49 bytes, I guess I win!
It can also do manipulations that are harder in a 'softer', safer HLL.
(My scripting language however can still do most of those underhand things.)
Anything computational can be done in a HLL. You may wish to
play tricks to save time. Or possible some packing tricks to
save memory. But packing tricks can be done in HLL (say by
treating whole memory as a big array of u64), so this really
boils down to speed.
I'm sure that with Python, say, pretty much anything can be done given enough effort. Even if it means cheating by using external add-on
modules to get around language limitations, like using Ctypes module,
which you will likely find uses C code.
This is different from having things part of the core language so they become effortless and natural.
But, everything you've said seems to have backed up my remark that
people only seem to consider two possibilities:
* Either a scripting language where it doesn't matter that it's 1-2 magnitudes slower than native code
* Or a compiled language where it absolutely MUST be at least as fast as gcc/clang-O3. Only 20 times faster than CPython is not enough!
(In my JPEG timings I posted earlier today, CPython was 175 times slower than gcc-O3. and 48-64 times slower than unoptimised C.
Applying the simplest optimsation, which I can tell you adds only 10% to compilation time) made native code over 100 times faster than CPython,
and only 50 slower than gcc-O3. This was on a deliberately large input
Basically, if you are generating even the worst native code, then it
will already wipe the floor with any scripting language, when comparing
them both executing the same algorithm.)
Most of a gcc installation is hundreds of header and archive (.a)[...]
files for various libraries. There might be 32-bit and 64-bit
versions. I understand that. But it also makes it hard to isolate the
core compiler.
Bart <bc@freeuk.com> writes:
[...]
Most of a gcc installation is hundreds of header and archive (.a)[...]
files for various libraries. There might be 32-bit and 64-bit
versions. I understand that. But it also makes it hard to isolate the
core compiler.
That doesn't agree with my observations.
Of course most of the headers and libraries are not part of gcc itself.
As usual, you refer to the entire implementation as "gcc".
I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5, installing each into a new directory.
The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I strip
the executables.
The glibc installation (libraries and headers) is about 199 MB, a small fraction of the size of the gcc intallation.
Of course there are other libraries that can be used with gcc, and they
could take a lot of space -- but they're not part of gcc.
Bart <bc@freeuk.com> writes:
[...]
Most of a gcc installation is hundreds of header and archive (.a)[...]
files for various libraries. There might be 32-bit and 64-bit
versions. I understand that. But it also makes it hard to isolate the
core compiler.
That doesn't agree with my observations.
Of course most of the headers and libraries are not part of gcc itself.
As usual, you refer to the entire implementation as "gcc".
I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5, installing each into a new directory.
The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I strip
the executables.
On 24/11/2024 20:01, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Most of a gcc installation is hundreds of header and archive (.a)[...]
files for various libraries. There might be 32-bit and 64-bit
versions. I understand that. But it also makes it hard to isolate the
core compiler.
That doesn't agree with my observations.
Of course most of the headers and libraries are not part of gcc
itself.
As usual, you refer to the entire implementation as "gcc".
I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5,
installing each into a new directory.
The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I
strip
the executables.
That's even huger than mine! So, that are those 3.7GB full of? What
does the 1.9GB of executables do?
The glibc installation (libraries and headers) is about 199 MB, a small
fraction of the size of the gcc intallation.
Is that included in one of those two divisions above?
Of course there are other libraries that can be used with gcc, and they
could take a lot of space -- but they're not part of gcc.
So, what /is/ gcc? What's the minimum installation that can compile
hello.c to hello.s for example?
I've done that experiment on my TDM version, and the answer appears to
be about 40MB in this directory structure:
Directory of c:\tdm\bin
24/07/2024 10:21 1,926,670 gcc.exe
24/07/2024 10:21 2,279,503 libisl-23.dll
24/07/2024 10:22 164,512 libmpc-3.dll
24/07/2024 10:22 702,852 libmpfr-6.dll
Directory of c:\tdm\libexec\gcc\x86_64-w64-mingw32\14.1.0
24/07/2024 10:24 34,224,654 cc1.exe
Directory of c:\tdm\x86_64-w64-mingw32\include
17/01/2021 17:33 368 stddef.h
27/03/2021 20:07 2,924 stdio.h
7 File(s) 39,301,483 bytes
Here I cheated a little and used the minimum std headers from my
compiler, otherwise I could have spent an hour chasing down dozens of
obscure nested headers that gcc's stdio.h likes to make use of.
Is /this/ gcc then? Will you agree that it is by no means clear what
'gcc' includes, or what to call the part of a gcc installed bundle
that is not technically gcc?
A more useful installation would of course need more standard headers,
an assembler, linker, and whatever .a files are needed to provide the standard library.
With clang, it is easier: apparently everything needed to do the
above, other than header files, is contained with a 120MB executable clang.exe.
However the full 2.8GB llvm/clang installation doesn't provide any
headers, nor a linker. At least it doesn't use the provided 88MB (!)
lld.exe; it expects to work on top of MSVC, which it has never managed
to do.
On 24/11/2024 20:01, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Most of a gcc installation is hundreds of header and archive (.a)[...]
files for various libraries. There might be 32-bit and 64-bit
versions. I understand that. But it also makes it hard to isolate the
core compiler.
That doesn't agree with my observations.
Of course most of the headers and libraries are not part of gcc itself.
As usual, you refer to the entire implementation as "gcc".
I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5,
installing each into a new directory.
The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I strip
the executables.
That's even huger than mine! So, that are those 3.7GB full of? What does
the 1.9GB of executables do?
Of course there are other libraries that can be used with gcc, and they
could take a lot of space -- but they're not part of gcc.
So, what /is/ gcc? What's the minimum installation that can compile
hello.c to hello.s for example?
I've done that experiment on my TDM version, and the answer appears to
be about 40MB in this directory structure:
Directory of c:\tdm\bin
24/07/2024 10:21 1,926,670 gcc.exe
24/07/2024 10:21 2,279,503 libisl-23.dll
24/07/2024 10:22 164,512 libmpc-3.dll
24/07/2024 10:22 702,852 libmpfr-6.dll
Directory of c:\tdm\libexec\gcc\x86_64-w64-mingw32\14.1.0
24/07/2024 10:24 34,224,654 cc1.exe
Directory of c:\tdm\x86_64-w64-mingw32\include
17/01/2021 17:33 368 stddef.h
27/03/2021 20:07 2,924 stdio.h
7 File(s) 39,301,483 bytes
Here I cheated a little and used the minimum std headers from my
compiler, otherwise I could have spent an hour chasing down dozens of obscure nested headers that gcc's stdio.h likes to make use of.
Is /this/ gcc then? Will you agree that it is by no means clear what
'gcc' includes, or what to call the part of a gcc installed bundle that
is not technically gcc?
A more useful installation would of course need more standard headers,
an assembler, linker, and whatever .a files are needed to provide the standard library.
With clang, it is easier: apparently everything needed to do the above, other than header files, is contained with a 120MB executable clang.exe.
Bart <bc@freeuk.com> writes:
On 24/11/2024 20:01, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Most of a gcc installation is hundreds of header and archive (.a)[...]
files for various libraries. There might be 32-bit and 64-bit
versions. I understand that. But it also makes it hard to isolate the
core compiler.
That doesn't agree with my observations.
Of course most of the headers and libraries are not part of gcc
itself.
As usual, you refer to the entire implementation as "gcc".
I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5,
installing each into a new directory.
The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I
strip
the executables.
That's even huger than mine! So, that are those 3.7GB full of? What
does the 1.9GB of executables do?
I installed compilers for multiple languages. A more typical
installation likely won't include compilers for Ada, Go, Fortran,
Modula-2, and Rust. There are a number of hard links to other files;
for example c++, g++, x86_64-pc-linux-gnu-c++, and
x86_64-pc-linux-gnu-g++ are all the same file. Apparently `du` is
clever enough to count them only once.
Here's the output of `ls -s` on the bin directory (sizes are in units of
1024 bytes) :
total 611908
8828 c++ 8960 gm2 8828 x86_64-pc-linux-gnu-c++
8820 cpp 8264 gnat 8828 x86_64-pc-linux-gnu-g++
8828 g++ 13092 gnatbind 8820 x86_64-pc-linux-gnu-gcc
8820 gcc 9556 gnatchop 8820 x86_64-pc-linux-gnu-gcc-14.2.0
156 gcc-ar 12564 gnatclean 156 x86_64-pc-linux-gnu-gcc-ar
156 gcc-nm 7864 gnatkr 156 x86_64-pc-linux-gnu-gcc-nm
152 gcc-ranlib 8564 gnatlink 152 x86_64-pc-linux-gnu-gcc-ranlib
8828 gccgo 12764 gnatls 8828 x86_64-pc-linux-gnu-gccgo
8820 gccrs 13584 gnatmake 8820 x86_64-pc-linux-gnu-gccrs
7784 gcov 12236 gnatname 8828 x86_64-pc-linux-gnu-gdc
6324 gcov-dump 12308 gnatprep 8824 x86_64-pc-linux-gnu-gfortran
6468 gcov-tool 11136 go 8960 x86_64-pc-linux-gnu-gm2
8828 gdc 620 gofmt
8824 gfortran 308740 lto-dump
The glibc installation (libraries and headers) is about 199 MB, a small
fraction of the size of the gcc intallation.
Is that included in one of those two divisions above?
Of course not. glibc is not part of gcc.
Of course there are other libraries that can be used with gcc, and they
could take a lot of space -- but they're not part of gcc.
So, what /is/ gcc? What's the minimum installation that can compile
hello.c to hello.s for example?
Those are two separate questions. gcc by itself can't compile hello.c
to hello.s. But it's always installed along with other tools that allow
it to do so, as part of what the C standard calls an "implementation".
You can't compile hello.c to hello.s without an OS kernel, but I presume you'd agree that the kernel isn't part of gcc. And hello.s isn't useful without an assembler, which is not treated as part of gcc.
gcc is a compiler, or rather a compiler collection. (The "gcc" command
is the C compiler component of the "gcc" compiler collection.) Since
gcc does not provide <stdio.h>, I presume that a standalone gcc would
not be able to compile hello.c without depending on a library, whether
that library is installed separately or as part of a package like
tdm-gcc (there's nothing wrong with either approach).
I should also acknowledge that the "gcc" package, whether it's provided
as source code or as binaries, provides some files that are not part of
the compiler itself, for example library files that are closely tied to
the compiler. Installable software packages don't have to follow any particular division between compiler, library, and other components.
When I install gcc, binutils, and glibc from the Ubuntu package manager,
the binaries are installed in common directories (/usr/bin, /usr/lib, et
al). There's no "gcc directory" or "glibc directory". But the system
keeps track of which files were install from which packages.
Perhaps you don't care what is or isn't part of "gcc". If that's the
case, that's fine, but it would help if you'd stop referring to things
as "gcc" without knowing what that means. You're using "gcc-tdm"; just
call it that.
I've done that experiment on my TDM version, and the answer appears to
be about 40MB in this directory structure:
Directory of c:\tdm\bin
24/07/2024 10:21 1,926,670 gcc.exe
24/07/2024 10:21 2,279,503 libisl-23.dll
24/07/2024 10:22 164,512 libmpc-3.dll
24/07/2024 10:22 702,852 libmpfr-6.dll
Directory of c:\tdm\libexec\gcc\x86_64-w64-mingw32\14.1.0
24/07/2024 10:24 34,224,654 cc1.exe
Directory of c:\tdm\x86_64-w64-mingw32\include
17/01/2021 17:33 368 stddef.h
27/03/2021 20:07 2,924 stdio.h
7 File(s) 39,301,483 bytes
Here I cheated a little and used the minimum std headers from my
compiler, otherwise I could have spent an hour chasing down dozens of
obscure nested headers that gcc's stdio.h likes to make use of.
Is /this/ gcc then? Will you agree that it is by no means clear what
'gcc' includes, or what to call the part of a gcc installed bundle
that is not technically gcc?
It's not entirely clear, but it's much clearer than you make it out to
be.
One thing that should be obvious by now is that stdio.h is not part of
"gcc", though it's probably part of "gcc-tdm". On my system, stddef.h
is provided by libgcc-11-dev, which is closely associated with gcc. I'm
not entirely sure why gcc-11 and libgcc-11-dev (the Ubuntu binary
packages) are separate -- nor do I have to care, since the package
management system is clever enough to recognize the dependencies and
keep them in sync.
A more useful installation would of course need more standard headers,
an assembler, linker, and whatever .a files are needed to provide the
standard library.
Sure, those are all part of a C implementation, though they're not part
of gcc.
Bart <bc@freeuk.com> wrote:
With clang, it is easier: apparently everything needed to do the above,
other than header files, is contained with a 120MB executable clang.exe.
Probably you means things needed to run the compiler. clang compiled executable need libraries too, on Debian this is shared with gcc.
On 24/11/2024 21:45, Keith Thompson wrote:[...]
Bart <bc@freeuk.com> writes:
A more useful installation would of course need more standard headers,Sure, those are all part of a C implementation, though they're not
an assembler, linker, and whatever .a files are needed to provide the
standard library.
part of gcc.
This seems to be a thing with Linux, where a big chunk of a C
implementation is provided by the OS.
That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities.
On Windows, C compilers tend to be self-contained (except
for Clang which appears to be parasitical: it used to piggy-back onto
gcc, then it switched to MSVC).
I'm not sure what the utility to compile C programs is called, if it
is not 'gcc'. But this is a C group, I would expect people to know it
is a C compiler, or the front end of one.
However I use 'gcc' in other forums and everyone knows what I mean.
What do /you/ call the C compiler that is invoked by gcc?
Bart <bc@freeuk.com> writes:
On 24/11/2024 20:01, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Most of a gcc installation is hundreds of header and archive (.a)[...]
files for various libraries. There might be 32-bit and 64-bit
versions. I understand that. But it also makes it hard to isolate
the core compiler.
That doesn't agree with my observations.
Of course most of the headers and libraries are not part of gcc
itself.
As usual, you refer to the entire implementation as "gcc".
I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5,
installing each into a new directory.
The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I
strip
the executables.
That's even huger than mine! So, that are those 3.7GB full of? What
does the 1.9GB of executables do?
I installed compilers for multiple languages. A more typical
installation likely won't include compilers for Ada, Go, Fortran,
Modula-2, and Rust. There are a number of hard links to other files;
for example c++, g++, x86_64-pc-linux-gnu-c++, and
x86_64-pc-linux-gnu-g++ are all the same file. Apparently `du` is
clever enough to count them only once.
Here's the output of `ls -s` on the bin directory (sizes are in units
of 1024 bytes) :
total 611908
8828 c++ 8960 gm2 8828 x86_64-pc-linux-gnu-c++
8820 cpp 8264 gnat 8828 x86_64-pc-linux-gnu-g++
8828 g++ 13092 gnatbind 8820 x86_64-pc-linux-gnu-gcc
8820 gcc 9556 gnatchop 8820
x86_64-pc-linux-gnu-gcc-14.2.0 156 gcc-ar 12564 gnatclean
156 x86_64-pc-linux-gnu-gcc-ar 156 gcc-nm 7864 gnatkr
156 x86_64-pc-linux-gnu-gcc-nm 152 gcc-ranlib 8564 gnatlink
152 x86_64-pc-linux-gnu-gcc-ranlib 8828 gccgo 12764 gnatls
8828 x86_64-pc-linux-gnu-gccgo 8820 gccrs 13584 gnatmake
8820 x86_64-pc-linux-gnu-gccrs 7784 gcov 12236 gnatname
8828 x86_64-pc-linux-gnu-gdc 6324 gcov-dump 12308 gnatprep
8824 x86_64-pc-linux-gnu-gfortran 6468 gcov-tool 11136 go
8960 x86_64-pc-linux-gnu-gm2 8828 gdc 620 gofmt
8824 gfortran 308740 lto-dump
This seems to be a thing with Linux, where a big chunk of a C
implementation is provided by the OS.
That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities. On Windows, C compilers tend to be self-contained (except for Clang which appears to be parasitical: it used to piggy-back onto gcc,
then it switched to MSVC).
I'm not sure what the utility to compile C programs is called, if it is
not 'gcc'. But this is a C group, I would expect people to know it is a
C compiler, or the front end of one.
However I use 'gcc' in other forums and everyone knows what I mean.
What do /you/ call the C compiler that is invoked by gcc?
Bart <bc@freeuk.com> writes:
On 24/11/2024 21:45, Keith Thompson wrote:[...]
Bart <bc@freeuk.com> writes:
A more useful installation would of course need more standard headers, >>>> an assembler, linker, and whatever .a files are needed to provide theSure, those are all part of a C implementation, though they're not
standard library.
part of gcc.
This seems to be a thing with Linux, where a big chunk of a C
implementation is provided by the OS.
I'm not sure what you mean by "provided by the OS". Linux-based
systems tend to be very modular, with almost everything provided by
some installable binary package. Some of those packages have to
be provided by default, for example any dynamic libraries relied
on by most executables. Files that are needed for development,
such as header files, compilers, and associated tools such as
assemblers and linkers, may be optional.
That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities.
On my system (Ubuntu), the as and ld commands are provided by the
binutils package ("binutils-x86-64-linux-gnu"). Some distributions
may install these by default. Others do not, but they're easy
to install.
On Windows, C compilers tend to be self-contained (except
for Clang which appears to be parasitical: it used to piggy-back onto
gcc, then it switched to MSVC).
I don't know what you mean by "piggy-back onto gcc".
I'm not sure what the utility to compile C programs is called, if it
is not 'gcc'. But this is a C group, I would expect people to know it
is a C compiler, or the front end of one.
However I use 'gcc' in other forums and everyone knows what I mean.
What do /you/ call the C compiler that is invoked by gcc?
I call it gcc.
"gcc" is the name for several things. It's the "GNU Compiler
Collection". It's the command invoked as the driver for any of
several compilers that are part of the GNU Compiler Collection.
It can refer specifically to the C compiler. It's mildly confusing
for historical reasons, but most people don't have much of a
problem with it, and don't pretend that it's more confusing than
it really is.
as part of gccAnd hello.s isn't useful without an assembler, which is not treated
Bart <bc@freeuk.com> writes:
[...]
Most of a gcc installation is hundreds of header and archive (.a)[...]
files for various libraries. There might be 32-bit and 64-bit
versions. I understand that. But it also makes it hard to isolate the
core compiler.
That doesn't agree with my observations.
Of course most of the headers and libraries are not part of gcc itself.
As usual, you refer to the entire implementation as "gcc".
I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5, installing each into a new directory.
The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I strip
the executables.
The glibc installation (libraries and headers) is about 199 MB, a small fraction of the size of the gcc intallation.
Of course there are other libraries that can be used with gcc, and they
could take a lot of space -- but they're not part of gcc.
These sizes might differ on Windows.
Bart <bc@freeuk.com> wrote:
This seems to be a thing with Linux, where a big chunk of a C
implementation is provided by the OS.
That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities. On Windows, C compilers tend to be self-contained (except for
Clang which appears to be parasitical: it used to piggy-back onto gcc,
then it switched to MSVC).
You know that at source level there are separate projects: gcc proper, binutils and libc.
libc provides C library, however header should
be matched to the library, so libc also provides headers.
Linux has distributions, which beside bare OS provide a lot of packages. Binary C library is used by almost all programs so is provided even
in minimal install. Linux has package managers, so everyting you
install may be split into small packages, but for user it is just
knowing few crucial names, package manager will install all
dependencies.
AFAIK Windows alone does not have a package manager and you apparently
reject package managers provided by third parties. So the only
viable approach is to install big bundle ("self-contained compiler").
On Sun, 24 Nov 2024 13:45:55 -0800
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
Bart <bc@freeuk.com> writes:
On 24/11/2024 20:01, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Most of a gcc installation is hundreds of header and archive (.a)[...]
files for various libraries. There might be 32-bit and 64-bit
versions. I understand that. But it also makes it hard to isolate
the core compiler.
That doesn't agree with my observations.
Of course most of the headers and libraries are not part of gcc
itself.
As usual, you refer to the entire implementation as "gcc".
I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5,
installing each into a new directory.
The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I
strip
the executables.
That's even huger than mine! So, that are those 3.7GB full of? What
does the 1.9GB of executables do?
I installed compilers for multiple languages. A more typical
installation likely won't include compilers for Ada, Go, Fortran,
Modula-2, and Rust. There are a number of hard links to other files;
for example c++, g++, x86_64-pc-linux-gnu-c++, and
x86_64-pc-linux-gnu-g++ are all the same file. Apparently `du` is
clever enough to count them only once.
Here's the output of `ls -s` on the bin directory (sizes are in units
of 1024 bytes) :
total 611908
8828 c++ 8960 gm2 8828 x86_64-pc-linux-gnu-c++
8820 cpp 8264 gnat 8828 x86_64-pc-linux-gnu-g++
8828 g++ 13092 gnatbind 8820 x86_64-pc-linux-gnu-gcc
8820 gcc 9556 gnatchop 8820
x86_64-pc-linux-gnu-gcc-14.2.0 156 gcc-ar 12564 gnatclean
156 x86_64-pc-linux-gnu-gcc-ar 156 gcc-nm 7864 gnatkr
156 x86_64-pc-linux-gnu-gcc-nm 152 gcc-ranlib 8564 gnatlink
152 x86_64-pc-linux-gnu-gcc-ranlib 8828 gccgo 12764 gnatls
8828 x86_64-pc-linux-gnu-gccgo 8820 gccrs 13584 gnatmake
8820 x86_64-pc-linux-gnu-gccrs 7784 gcov 12236 gnatname
8828 x86_64-pc-linux-gnu-gdc 6324 gcov-dump 12308 gnatprep
8824 x86_64-pc-linux-gnu-gfortran 6468 gcov-tool 11136 go
8960 x86_64-pc-linux-gnu-gm2 8828 gdc 620 gofmt
8824 gfortran 308740 lto-dump
67% of bin directory of i386 gcc13 compiler that I compiled from source
on msys2 few months ago is a single huge executive:i386-elf-lto-dump.exe 410,230,002 bytes with symbols, 28,347,904 bytes stripped.
Copying such file is not instant, even on SSD. Certainly takes time
over internet.
It does not look like I have any use for it, stripped or not. When I
want dump, I use smaller utility, i386-elf-objdump.exe (14,740,647
bytes with symbols, 2,242,048 bytes stripped) that already does more
than I would know to use.
Arm gcc12 compiler for small emebedded targets (arm-none-eabi-gcc) in
the same msys2 environment that I did not compile from source also
contains arm-none-eabi-lto-dump.exe and it is also the biggest exe by
far, but at least it is stripped and only 23,728,128
LTO object files are vastly different beasts from normal object
files, so it does not surprise me that the dump utility is so much
bigger. If you don't use LTO, then presumably you will not need the
lto-dump utility. (It is not a tool I have ever looked at myself.)
On 25/11/2024 11:21, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
This seems to be a thing with Linux, where a big chunk of a CYou know that at source level there are separate projects: gcc
implementation is provided by the OS.
That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities. On Windows, C compilers tend to be self-contained (except for >>> Clang which appears to be parasitical: it used to piggy-back onto gcc,
then it switched to MSVC).
proper, binutils and libc.
Actually, no I don't. I said more on this in my reply to Keith a short
while ago.
My experience of C compilers on Windows is that they provide a means
to turn .c files into executable files. Such a compiler on Windows
generally has to be self-contained, since very little is provided by
the OS.
So from my point of view, gcc is the outlier.
On 24/11/2024 21:01, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Most of a gcc installation is hundreds of header and archive (.a)[...]
files for various libraries. There might be 32-bit and 64-bit
versions. I understand that. But it also makes it hard to isolate the
core compiler.
That doesn't agree with my observations.
Of course most of the headers and libraries are not part of gcc
itself.
As usual, you refer to the entire implementation as "gcc".
I've built gcc 14.2.0 and glibc 2.40 from source on Ubuntu 22.04.5,
installing each into a new directory.
The gcc installation is about 5.6 GB, reduced to about 1.9 GB if I
strip the executables.
That sounds like a /very/ large size. A quick check of the pre-build
Debian package for gcc-14 is about 90 MB installed. (That is for the
C compiler - not binutils, or libraries.) C++ adds another 50% to
that. Are you including the build directories with all the object
files too?
Bart <bc@freeuk.com> writes:
On 25/11/2024 11:21, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
This seems to be a thing with Linux, where a big chunk of a CYou know that at source level there are separate projects: gcc
implementation is provided by the OS.
That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities. On Windows, C compilers tend to be self-contained (except for >>>> Clang which appears to be parasitical: it used to piggy-back onto gcc, >>>> then it switched to MSVC).
proper, binutils and libc.
Actually, no I don't. I said more on this in my reply to Keith a short
while ago.
You don't know that after it's been explained to you dozens of times?
My experience of C compilers on Windows is that they provide a means
to turn .c files into executable files. Such a compiler on Windows
generally has to be self-contained, since very little is provided by
the OS.
Bart, can you explain the difference between a C compiler and a C implementation? Or do you believe they're the same thing? (Hint:
They're not.)
On 24/11/2024 21:45, Keith Thompson wrote:
A more useful installation would of course need more standard headers,
an assembler, linker, and whatever .a files are needed to provide the
standard library.
Sure, those are all part of a C implementation, though they're not part
of gcc.
This seems to be a thing with Linux, where a big chunk of a C
implementation is provided by the OS.
That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities.
On Windows, C compilers tend to be self-contained (except for
Bart <bc@freeuk.com> writes:
On 24/11/2024 21:45, Keith Thompson wrote:
A more useful installation would of course need more standard headers, >>>> an assembler, linker, and whatever .a files are needed to provide the
standard library.
Sure, those are all part of a C implementation, though they're not part
of gcc.
This seems to be a thing with Linux, where a big chunk of a C
implementation is provided by the OS.
Actually, no. The OS provides the dynamic linker and some os-specific
header files. Pretty much everything else comes from various
third-party packages.
That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities.
None of those come from the OS. They come from separate packages
produced by third parties (some, like gcc, binutils, etc come from
the FSF, other libraries come from other sources).
On Windows, C compilers tend to be self-contained (except for
Leaving aside the fact that Windows has always been a toy
environment, all the tools you complain about were developed
on, and primarily for UNIX and derivitives. Not windows.
On Mon, 25 Nov 2024 13:45:28 +0100
David Brown <david.brown@hesbynett.no> wrote:
LTO object files are vastly different beasts from normal object
files, so it does not surprise me that the dump utility is so much
bigger. If you don't use LTO, then presumably you will not need the
lto-dump utility. (It is not a tool I have ever looked at myself.)
I am pretty sure that even if I ever want to use LTO with gcc I'd still
will have no need for lto-dump.
What would matter for me in this case
would be a final result (exe) rather than object files. And in order to
look at exe I'd still use a normal objdump.
The situation is not purely hypothetical. I regularly use LTCG with
Microsoft tools. Never ever I wanted to disassemble .obj files after
LTCG compilation. When occasionally I wanted to look at asm after LTCG,
it always was an exe.
It's funny how nobody seems to care about the speed of compilers
(which can vary by 100:1), but for the generated programs, the 2:1
speedup you might get by optimising it is vital!
On 25/11/2024 16:27, Keith Thompson wrote:
Bart, can you explain the difference between a C compiler and a C
implementation? Or do you believe they're the same thing? (Hint:
They're not.)
Well, I write language implementations, and I consider them largely
the same thing.
So who's right?
Bart <bc@freeuk.com> writes:
On 24/11/2024 21:45, Keith Thompson wrote:
A more useful installation would of course need more standard headers, >>>> an assembler, linker, and whatever .a files are needed to provide the
standard library.
Sure, those are all part of a C implementation, though they're not part
of gcc.
This seems to be a thing with Linux, where a big chunk of a C
implementation is provided by the OS.
Actually, no. The OS provides the dynamic linker and some os-specific
header files. Pretty much everything else comes from various
third-party packages.
That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities.
None of those come from the OS.
Bart <bc@freeuk.com> writes:
It's funny how nobody seems to care about the speed of compilers
(which can vary by 100:1), but for the generated programs, the 2:1
speedup you might get by optimising it is vital!
I think most people would rather take this path (these times
are actual measured times of a recently written program):
compile time: 1 second
program run time: ~7 hours
than this path (extrapolated using the ratios mentioned above):
compile time: 0.01 second
program run time: ~14 hours
Bart <bc@freeuk.com> writes:
On 25/11/2024 16:27, Keith Thompson wrote:
Bart, can you explain the difference between a C compiler and a C
implementation? Or do you believe they're the same thing? (Hint:
They're not.)
Well, I write language implementations, and I consider them largely
the same thing.
So who's right?
In comp.lang.c, the C standard is right.
So, if I install 5 distinct C compilers on Linux, will they each come
with their own stdio.h, or will they use the common one in
/usr/include?
On 25/11/2024 18:49, Tim Rentsch wrote:
I'm trying to think of some computationally intensive app that would run >non-stop for several hours without interaction.
Bart <bc@freeuk.com> writes:
On 25/11/2024 18:49, Tim Rentsch wrote:
I'm trying to think of some computationally intensive app that would run
non-stop for several hours without interaction.
I can think of several - HDL simulators (vcs, et al), system simulators
like Simh, Qemu, Synopsys Virutalizer, SIMICS, most HPC codes (e.g. fluid dynamics)
Machine Learning training, et alia.
On 25/11/2024 21:29, Scott Lurndal wrote:
Bart <bc@freeuk.com> writes:
On 25/11/2024 18:49, Tim Rentsch wrote:
I'm trying to think of some computationally intensive app that would run >>> non-stop for several hours without interaction.
I can think of several - HDL simulators (vcs, et al), system simulators
like Simh, Qemu, Synopsys Virutalizer, SIMICS, most HPC codes (e.g. fluid dynamics)
Machine Learning training, et alia.
OK, good.
So the only preparation you have to do to get those running at maximum
speed is just to use -O3 on your compilers instead of -O0.
Understood. You don't need to need to worry about anything else.
Bart <bc@freeuk.com> writes:
On 25/11/2024 21:29, Scott Lurndal wrote:
Bart <bc@freeuk.com> writes:
On 25/11/2024 18:49, Tim Rentsch wrote:
I'm trying to think of some computationally intensive app that would run >>>> non-stop for several hours without interaction.
I can think of several - HDL simulators (vcs, et al), system simulators
like Simh, Qemu, Synopsys Virutalizer, SIMICS, most HPC codes (e.g. fluid dynamics)
Machine Learning training, et alia.
OK, good.
So the only preparation you have to do to get those running at maximum
speed is just to use -O3 on your compilers instead of -O0.
That appears to be your opinion. It is not shared by myself
nor any programmer I've ever met.
Understood. You don't need to need to worry about anything else.
How do you conclude that based on a simple list of applications?
Everything from the initial design proposal to the selection of implementation language to the characteristics of the data structures
to the algorithms chosen are part of the process of creating a real-world application. The actually compiler flags are in the noise, for the
most part.
On 25/11/2024 18:49, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
It's funny how nobody seems to care about the speed of compilers
(which can vary by 100:1), but for the generated programs, the 2:1
speedup you might get by optimising it is vital!
I think most people would rather take this path (these times
are actual measured times of a recently written program):
compile time: 1 second
program run time: ~7 hours
than this path (extrapolated using the ratios mentioned above):
compile time: 0.01 second
program run time: ~14 hours
I'm trying to think of some computationally intensive app that would
run non-stop for several hours without interaction.
Bart <bc@freeuk.com> writes:
On 25/11/2024 18:49, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
It's funny how nobody seems to care about the speed of compilers
(which can vary by 100:1), but for the generated programs, the 2:1
speedup you might get by optimising it is vital!
I think most people would rather take this path (these times
are actual measured times of a recently written program):
compile time: 1 second
program run time: ~7 hours
than this path (extrapolated using the ratios mentioned above):
compile time: 0.01 second
program run time: ~14 hours
I'm trying to think of some computationally intensive app that would
run non-stop for several hours without interaction.
The conclusion is the same whether the program run time
is 7 hours, 7 minutes, or 7 seconds.
On 25/11/2024 17:30, Scott Lurndal wrote:
Bart <bc@freeuk.com> writes:
On 24/11/2024 21:45, Keith Thompson wrote:
A more useful installation would of course need more standard headers, >>>>> an assembler, linker, and whatever .a files are needed to provide the >>>>> standard library.
Sure, those are all part of a C implementation, though they're not part >>>> of gcc.
This seems to be a thing with Linux, where a big chunk of a C
implementation is provided by the OS.
Actually, no. The OS provides the dynamic linker and some os-specific
header files. Pretty much everything else comes from various
third-party packages.
That is, standard headers, libraries, possibly even 'as' and 'ld'
utilities.
None of those come from the OS.
So, if I install 5 distinct C compilers on Linux, will they each come
with their own stdio.h, or will they use the common one in /usr/include?
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
Bart <bc@freeuk.com> writes:
On 25/11/2024 16:27, Keith Thompson wrote:
Bart, can you explain the difference between a C compiler and a C
implementation? Or do you believe they're the same thing? (Hint:
They're not.)
Well, I write language implementations, and I consider them largely
the same thing.
So who's right?
In comp.lang.c, the C standard is right.
Agreed, but the C standard doesn't define the word "compiler",
and uses it only in non-normative text (I searched N3096).
On 26/11/2024 12:29, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
On 25/11/2024 18:49, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
It's funny how nobody seems to care about the speed of compilers
(which can vary by 100:1), but for the generated programs, the 2:1
speedup you might get by optimising it is vital!
I think most people would rather take this path (these times
are actual measured times of a recently written program):
compile time: 1 second
program run time: ~7 hours
than this path (extrapolated using the ratios mentioned above):
compile time: 0.01 second
program run time: ~14 hours
I'm trying to think of some computationally intensive app that would
run non-stop for several hours without interaction.
The conclusion is the same whether the program run time
is 7 hours, 7 minutes, or 7 seconds.
Funny you should mention 7 seconds. If I'm working on single source
file called sql.c for example, that's how long it takes for gcc to
create an unoptimised executable:
c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38
Bart <bc@freeuk.com> writes:
On 26/11/2024 12:29, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
On 25/11/2024 18:49, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
It's funny how nobody seems to care about the speed of compilers
(which can vary by 100:1), but for the generated programs, the
2:1 speedup you might get by optimising it is vital!
I think most people would rather take this path (these times
are actual measured times of a recently written program):
compile time: 1 second
program run time: ~7 hours
than this path (extrapolated using the ratios mentioned above):
compile time: 0.01 second
program run time: ~14 hours
I'm trying to think of some computationally intensive app that
would run non-stop for several hours without interaction.
The conclusion is the same whether the program run time
is 7 hours, 7 minutes, or 7 seconds.
Funny you should mention 7 seconds. If I'm working on single source
file called sql.c for example, that's how long it takes for gcc to
create an unoptimised executable:
c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38
Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed.
And of course you picked the farthest-most
outlier as your example, grossly misrepresenting any sort of
average or typical case.
Bart <bc@freeuk.com> writes:
On 26/11/2024 12:29, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
On 25/11/2024 18:49, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
It's funny how nobody seems to care about the speed of compilers
(which can vary by 100:1), but for the generated programs, the 2:1 >>>>>> speedup you might get by optimising it is vital!
I think most people would rather take this path (these times
are actual measured times of a recently written program):
compile time: 1 second
program run time: ~7 hours
than this path (extrapolated using the ratios mentioned above):
compile time: 0.01 second
program run time: ~14 hours
I'm trying to think of some computationally intensive app that would
run non-stop for several hours without interaction.
The conclusion is the same whether the program run time
is 7 hours, 7 minutes, or 7 seconds.
Funny you should mention 7 seconds. If I'm working on single source
file called sql.c for example, that's how long it takes for gcc to
create an unoptimised executable:
c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38
Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed. And of course you picked the farthest-most
outlier as your example, grossly misrepresenting any sort of
average or typical case.
On Wed, 27 Nov 2024 21:18:09 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38
Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed.
My impression was that Bart is talking about machine-generated code.
For machine generated code 250Kloc is not too much.
I've several times suggested that gcc should have an -O-1 option that
runs a secretly bundled version of Tiny C.
[ compilation times ]
And for me, used to decades of sub-one-second response times, 7 seconds
seems like for ever. [...]
On 28.11.2024 15:27, Bart wrote:
[ compilation times ]
And for me, used to decades of sub-one-second response times, 7 seconds
seems like for ever. [...]
Sub-seconds is very important in response times of interactive tools;
I recall we've measured, e.g. for GUI applications, the exact timing,
and we've taken into account results of psychological sciences. The
accepted response times for our applications were somewhere around
0.20 seconds, and even 0.50 seconds was by far unacceptable.
But we're speaking about compilation times. And I'm a bit astonished
about a sub-second requirement or necessity. I'm typically compiling
source code after I've edited it, where the latter is by far the most dominating step. And before the editing there's usually the analysis
of code, that requires even more time than the simple but interactive
editing process.
tasks that are necessary to create the software fix have already been
done, and I certainly don't need a sub-second response from compiler.
Though I observed a certain behavior of programmers who use tools with
a fast response time. Since it doesn't cost anything they just make a
single change and compile to see whether it works, and, rinse repeat,
do that for every _single_ change *multiple* times.
My own programming
habits got also somewhat influenced by that, though I still try to fix
things in brain before I ask the compiler what it thinks of my change.
This is certainly influenced by the mainframe days where I designed my algorithms on paper, punched my program on a stack of punch cards, and examined and fixed the errors all at once.
On 28/11/2024 17:28, Janis Papanagnou wrote:
On 28.11.2024 15:27, Bart wrote:
[ compilation times ]
And for me, used to decades of sub-one-second response times, 7 seconds
seems like for ever. [...]
Sub-seconds is very important in response times of interactive tools;
I recall we've measured, e.g. for GUI applications, the exact timing,
and we've taken into account results of psychological sciences. The
accepted response times for our applications were somewhere around
0.20 seconds, and even 0.50 seconds was by far unacceptable.
But we're speaking about compilation times. And I'm a bit astonished
about a sub-second requirement or necessity. I'm typically compiling
source code after I've edited it, where the latter is by far the most
dominating step. And before the editing there's usually the analysis
of code, that requires even more time than the simple but interactive
editing process.
You can make a similar argument about turning on the light switch when entering a room. Flicking light switches is not something you need to do every few seconds, but if the light took 5 seconds to come on (or even
one second), it would be incredibly annoying.
It would stop the fluency of whatever you were planning to do. You might
even forget why you needed to go into the room in the first place.
When I start the compile all the major time demanding
tasks that are necessary to create the software fix have already been
done, and I certainly don't need a sub-second response from compiler.
Though I observed a certain behavior of programmers who use tools with
a fast response time. Since it doesn't cost anything they just make a
single change and compile to see whether it works, and, rinse repeat,
do that for every _single_ change *multiple* times.
Well, what's wrong with that? It's how lots of things already work, by
doing things incrementally.
If recompiling an entire program of any size really was instant, would
you still work exactly the same way?
People find scripting languages productive, partly because there is no discrete build step.
My own programming
habits got also somewhat influenced by that, though I still try to fix
things in brain before I ask the compiler what it thinks of my change.
This is certainly influenced by the mainframe days where I designed my
algorithms on paper, punched my program on a stack of punch cards, and
examined and fixed the errors all at once.
I also remember using punched cards at college. But generally it was
using an interactive terminal. Compiling and linking were still big
deals when using mini- and mainframe computers.
Oddly, it was only using tiny, underpowered microprocessor systems, that
I realised how fast language tools really could be. At least the ones I wrote.
[...]
On 28/11/2024 17:28, Janis Papanagnou wrote:
On 28.11.2024 15:27, Bart wrote:
[ compilation times ]
And for me, used to decades of sub-one-second response times, 7
seconds seems like for ever. [...]
Sub-seconds is very important in response times of interactive
tools; I recall we've measured, e.g. for GUI applications, the
exact timing, and we've taken into account results of psychological
sciences. The accepted response times for our applications were
somewhere around 0.20 seconds, and even 0.50 seconds was by far
unacceptable.
But we're speaking about compilation times. And I'm a bit
astonished about a sub-second requirement or necessity. I'm
typically compiling source code after I've edited it, where the
latter is by far the most dominating step. And before the editing
there's usually the analysis of code, that requires even more time
than the simple but interactive editing process.
You can make a similar argument about turning on the light switch
when entering a room. Flicking light switches is not something you
need to do every few seconds, but if the light took 5 seconds to
come on (or even one second), it would be incredibly annoying.
Bart <bc@freeuk.com> writes:
On 28/11/2024 17:28, Janis Papanagnou wrote:
But we're speaking about compilation times. [...]
You can make a similar argument about turning on the light switch
when entering a room. Flicking light switches is not something you
need to do every few seconds, but if the light took 5 seconds to
come on (or even one second), it would be incredibly annoying.
This analogy sounds like something a defense attorney would say who
has a client that everyone knows is guilty.
On 30.11.2024 00:29, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
On 28/11/2024 17:28, Janis Papanagnou wrote:
But we're speaking about compilation times. [...]
You can make a similar argument about turning on the light switch
when entering a room. Flicking light switches is not something you
need to do every few seconds, but if the light took 5 seconds to
come on (or even one second), it would be incredibly annoying.
This analogy sounds like something a defense attorney would say who
has a client that everyone knows is guilty.
Intentionally or not; it's funny to respond to an analogy with an
analogy. :-}
On 28/11/2024 05:18, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
On 26/11/2024 12:29, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
On 25/11/2024 18:49, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
It's funny how nobody seems to care about the speed of
compilers (which can vary by 100:1), but for the generated
programs, the 2:1 speedup you might get by optimising it is
vital!
I think most people would rather take this path (these times
are actual measured times of a recently written program):
compile time: 1 second
program run time: ~7 hours
than this path (extrapolated using the ratios mentioned above):
compile time: 0.01 second
program run time: ~14 hours
I'm trying to think of some computationally intensive app that
would run non-stop for several hours without interaction.
The conclusion is the same whether the program run time
is 7 hours, 7 minutes, or 7 seconds.
Funny you should mention 7 seconds. If I'm working on single
source file called sql.c for example, that's how long it takes for
gcc to create an unoptimised executable:
c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38
Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed. And of course you picked the farthest-most
outlier as your example, grossly misrepresenting any sort of
average or typical case.
It's not atypical for me! [...]
On Wed, 27 Nov 2024 21:18:09 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Bart <bc@freeuk.com> writes:
On 26/11/2024 12:29, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
On 25/11/2024 18:49, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
It's funny how nobody seems to care about the speed of
compilers (which can vary by 100:1), but for the generated
programs, the 2:1 speedup you might get by optimising it is
vital!
I think most people would rather take this path (these times
are actual measured times of a recently written program):
compile time: 1 second
program run time: ~7 hours
than this path (extrapolated using the ratios mentioned above):
compile time: 0.01 second
program run time: ~14 hours
I'm trying to think of some computationally intensive app that
would run non-stop for several hours without interaction.
The conclusion is the same whether the program run time
is 7 hours, 7 minutes, or 7 seconds.
Funny you should mention 7 seconds. If I'm working on single
source file called sql.c for example, that's how long it takes for
gcc to create an unoptimised executable:
c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38
Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed.
My impression was that Bart is talking about machine-generated code.
For machine generated code 250Kloc is not too much. I would think
that in field of compiled-code HDL simulation people are interested
in compilation of as big sources as the can afford.
And of course you picked the farthest-most
outlier as your example, grossly misrepresenting any sort of
average or typical case.
I remember having much shorter file (core of 3rd-party TCP protocol implementation) where compilation with gcc took several seconds.
Looked at it now - only 22 Klocs.
Text size in .o - 34KB.
Compilation time on much newer computer than the one I remembered, with
good SATA SSD and 4 GHz Intel Haswell CPU - a little over 1 sec. That
with gcc 4.7.3. I would guess that if I try gcc13 it would be 1.5 to 2
times longer.
So, in terms of Klock/sec it seems to me that time reported by Bart
is not outrageous. Indeed, gcc is very slow when compiling any source several times above average size.
In this particular case I can not compare gcc to alternative, because
for a given target (Altera Nios2) there are no alternatives.
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 30.11.2024 00:29, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
On 28/11/2024 17:28, Janis Papanagnou wrote:
But we're speaking about compilation times. [...]
You can make a similar argument about turning on the light switch
when entering a room. Flicking light switches is not something you
need to do every few seconds, but if the light took 5 seconds to
come on (or even one second), it would be incredibly annoying.
This analogy sounds like something a defense attorney would say who
has a client that everyone knows is guilty.
Intentionally or not; it's funny to respond to an analogy with an
analogy. :-}
My statement was not an analogy. Similar is not the same as
analogous.
Michael S <already5chosen@yahoo.com> writes:
On Wed, 27 Nov 2024 21:18:09 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Bart <bc@freeuk.com> writes:
On 26/11/2024 12:29, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
On 25/11/2024 18:49, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
It's funny how nobody seems to care about the speed of
compilers (which can vary by 100:1), but for the generated
programs, the 2:1 speedup you might get by optimising it is
vital!
I think most people would rather take this path (these times
are actual measured times of a recently written program):
compile time: 1 second
program run time: ~7 hours
than this path (extrapolated using the ratios mentioned above):
compile time: 0.01 second
program run time: ~14 hours
I'm trying to think of some computationally intensive app that
would run non-stop for several hours without interaction.
The conclusion is the same whether the program run time
is 7 hours, 7 minutes, or 7 seconds.
Funny you should mention 7 seconds. If I'm working on single
source file called sql.c for example, that's how long it takes for
gcc to create an unoptimised executable:
c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38
Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed.
My impression was that Bart is talking about machine-generated code.
For machine generated code 250Kloc is not too much. I would think
that in field of compiled-code HDL simulation people are interested
in compilation of as big sources as the can afford.
Sure. But Bart is implicitly saying that such cases make up the
bulk of C compilations, whereas in fact the reverse is true. People
don't care about Bart's complaint because the circumstances of his
examples almost never apply to them. And he must know this, even
though he tries to pretend he doesn't.
And of course you picked the farthest-most
outlier as your example, grossly misrepresenting any sort of
average or typical case.
I remember having much shorter file (core of 3rd-party TCP protocol
implementation) where compilation with gcc took several seconds.
Looked at it now - only 22 Klocs.
Text size in .o - 34KB.
Compilation time on much newer computer than the one I remembered, with
good SATA SSD and 4 GHz Intel Haswell CPU - a little over 1 sec. That
with gcc 4.7.3. I would guess that if I try gcc13 it would be 1.5 to 2
times longer.
So, in terms of Klock/sec it seems to me that time reported by Bart
is not outrageous. Indeed, gcc is very slow when compiling any source
several times above average size.
In this particular case I can not compare gcc to alternative, because
for a given target (Altera Nios2) there are no alternatives.
I'm not disputing his ratios on compilation speeds. I implicitly
agreed to them in my earlier remarks. The point is that the
absolute times are so small that most people don't care. For
some reason I can't fathom Bart does care, and apparently cannot
understand why most other people do not care. My conclusion is
that Bart is either quite immature or a narcissist. I have tried
to explain to him why other people think differently than he does,
but it seems he isn't really interested in having it explained.
Oh well, not my problem.
My conclusion is
that Bart is either quite immature or a narcissist.
On 2024-11-16, Stefan Ram wrote:
Dan Purgert <dan@djph.net> wrote or quoted:
if (n==0) { printf ("n: %u\n",n); n++;}
if (n==1) { printf ("n: %u\n",n); n++;}
if (n==2) { printf ("n: %u\n",n); n++;}
if (n==3) { printf ("n: %u\n",n); n++;}
if (n==4) { printf ("n: %u\n",n); n++;}
printf ("all if completed, n=%u\n",n);
My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!
I honestly lost the plot ages ago; not sure if it was either!
In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".
Segfaults? :D
Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.
#include <stdio.h>
const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }
void print_english( int const n )
{ printf( "%s\n", english( n )); }
int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }
oooh, that's way better at making a point of the hazard than mine was.
... almost needed to engage my rubber duckie, before I realized I was >mentally auto-correcting the 'english()' function while reading it.
On 16.11.2024 16:14, James Kuyper wrote:
On 11/16/24 04:42, Stefan Ram wrote:
...
[...]
#include <stdio.h>
const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }
That's indeed a nice example. Where you get fooled by treachery
"trustiness" of formatting[*]. - In syntax we trust! [**]
My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!
In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".
Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.
#include <stdio.h>
const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }
void print_english( int const n )
{ printf( "%s\n", english( n )); }
int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }
On 28/11/2024 12:37, Michael S wrote:
On Wed, 27 Nov 2024 21:18:09 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38
Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed.
My impression was that Bart is talking about machine-generated code.
For machine generated code 250Kloc is not too much.
This file mostly comprises sqlite3.c which is a machine-generated amalgamation of some 100 actual C files.
You wouldn't normally do development with that version, but in my
scenario, where I was trying to find out why the version built with my compiler was buggy, I might try adding debug info to it then building
with a working compiler (eg. gcc) to compare with.
Tim isn't asking the right questions (or any questions!). WHY does gcc
take so long to generate indifferent code when the task can clearly be
done at least a magnitude faster?
On 30/11/2024 05:25, Tim Rentsch wrote:
Michael S <already5chosen@yahoo.com> writes:
On Wed, 27 Nov 2024 21:18:09 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Bart <bc@freeuk.com> writes:
On 26/11/2024 12:29, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
On 25/11/2024 18:49, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
It's funny how nobody seems to care about the speed of
compilers (which can vary by 100:1), but for the generated
programs, the 2:1 speedup you might get by optimising it is
vital!
I think most people would rather take this path (these times
are actual measured times of a recently written program):
compile time: 1 second
program run time: ~7 hours
than this path (extrapolated using the ratios mentioned above): >>>>>>>>
compile time: 0.01 second
program run time: ~14 hours
I'm trying to think of some computationally intensive app that
would run non-stop for several hours without interaction.
The conclusion is the same whether the program run time
is 7 hours, 7 minutes, or 7 seconds.
Funny you should mention 7 seconds. If I'm working on single
source file called sql.c for example, that's how long it takes for
gcc to create an unoptimised executable:
c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38
Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed.
My impression was that Bart is talking about machine-generated code.
For machine generated code 250Kloc is not too much. I would think
that in field of compiled-code HDL simulation people are interested
in compilation of as big sources as the can afford.
Sure. But Bart is implicitly saying that such cases make up the
bulk of C compilations, whereas in fact the reverse is true. People
don't care about Bart's complaint because the circumstances of his
examples almost never apply to them. And he must know this, even
though he tries to pretend he doesn't.
And of course you picked the farthest-most
outlier as your example, grossly misrepresenting any sort of
average or typical case.
I remember having much shorter file (core of 3rd-party TCP protocol
implementation) where compilation with gcc took several seconds.
Looked at it now - only 22 Klocs.
Text size in .o - 34KB.
Compilation time on much newer computer than the one I remembered, with
good SATA SSD and 4 GHz Intel Haswell CPU - a little over 1 sec. That
with gcc 4.7.3. I would guess that if I try gcc13 it would be 1.5 to 2
times longer.
So, in terms of Klock/sec it seems to me that time reported by Bart
is not outrageous. Indeed, gcc is very slow when compiling any source
several times above average size.
In this particular case I can not compare gcc to alternative, because
for a given target (Altera Nios2) there are no alternatives.
I'm not disputing his ratios on compilation speeds. I implicitly
agreed to them in my earlier remarks. The point is that the
absolute times are so small that most people don't care. For
some reason I can't fathom Bart does care, and apparently cannot
understand why most other people do not care. My conclusion is
that Bart is either quite immature or a narcissist. I have tried
to explain to him why other people think differently than he does,
but it seems he isn't really interested in having it explained.
Oh well, not my problem.
EVERYBODY cares about compilation speeds. Except in this newsgroup where people try to pretent that it's irrelevant.
But then at the same time, they strive to keep those compile-times small:
* By using tools that have themselves been optimised to reduce their runtimes, and where considerable resources have been expended to get the best possible code, which naturally also benefits the tool
* By using the fastest possible hardware
* By trying to do parallel builds across multiple cores
* By organising source code into artificially small modules so that recompilation of just one module is quicker. So, relying on independent compilation.
* By going to considerable trouble to define inter-dependencies between modules, so that a make system can AVOID recompiling modules. (Why on
earth would it need to? Oh, because it would be slower!)
* By using development techniques involving thinking deeply about what
to change, to avoid a costly rebuild.
Etc.
All instead of relying on raw compilation speed and a lot of those
points become less relevant.
Bart <bc@freeuk.com> wrote:Difficult bugs always occur in larger codebases, but with C, these in a language that I can't navigate, and for programs which are not mine, and
On 28/11/2024 12:37, Michael S wrote:
On Wed, 27 Nov 2024 21:18:09 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38
Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed.
My impression was that Bart is talking about machine-generated code.
For machine generated code 250Kloc is not too much.
This file mostly comprises sqlite3.c which is a machine-generated
amalgamation of some 100 actual C files.
You wouldn't normally do development with that version, but in my
scenario, where I was trying to find out why the version built with my
compiler was buggy, I might try adding debug info to it then building
with a working compiler (eg. gcc) to compare with.
Even in context of developing a compiler I would not run blindly
many compiliations of large file.
At first stage I would debug
compiled program, to find out what is wrong with it.
After that I would try to minimize the testcase, removing code which
do not contribute to the bug.
That involves severla compilations
of files with quickly decreasing sizes.
Tim isn't asking the right questions (or any questions!). WHY does gcc
take so long to generate indifferent code when the task can clearly be
done at least a magnitude faster?
The simple answer is: users tolerate long compile time. If users
abandoned 'gcc' to some other compiler due to long compile time,
then 'gcc' developers would notice.
You need to improve your propaganda for faster C compilers...
Stefan Ram <ram@zedat.fu-berlin.de> wrote:
My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!
In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".
Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.
#include <stdio.h>
const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }
void print_english( int const n )
{ printf( "%s\n", english( n )); }
int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }
That breaks two rules:
- instructions conditioned by 'if' should have braces,
- when we have the result we should return it immediately.
Once those are fixed code works as expected...
Stefan Ram <ram@zedat.fu-berlin.de> wrote:
My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!
In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".
Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.
#include <stdio.h>
const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }
void print_english( int const n )
{ printf( "%s\n", english( n )); }
int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }
That breaks two rules:
- instructions conditioned by 'if' should have braces,
- when we have the result we should return it immediately.
On 01.12.2024 13:41, Waldek Hebisch wrote:
Stefan Ram <ram@zedat.fu-berlin.de> wrote:
My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!
In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".
Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.
#include <stdio.h>
const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }
void print_english( int const n )
{ printf( "%s\n", english( n )); }
int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }
That breaks two rules:
- instructions conditioned by 'if' should have braces,
I suppose you don't mean
if (n == value) { result = string; }
else { result = other; }
which I'd think doesn't change anything. - So what is it?
Actually, you should just add explicit 'else' to fix the problem.
(Here there's no need to fiddle with spurious braces, I'd say.)
- when we have the result we should return it immediately.
This would suffice to fix it, wouldn't it?
Once those are fixed code works as expected...
I find this answer - not wrong, but - problematic for two reasons.
There's no accepted "general rules" that could get "broken"; it's
just rules that serve in given languages and application contexts.
And they may conflict with other "rules" that have been set up to
streamline code, make it safer, or whatever.
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 01.12.2024 13:41, Waldek Hebisch wrote:
Stefan Ram <ram@zedat.fu-berlin.de> wrote:
My bad if the following instruction structure's already been hashed
out in this thread, but I haven't been following the whole convo!
In my C 101 classes, after we've covered "if" and "else",
I always throw this program up on the screen and hit the newbies
with this curveball: "What's this bad boy going to spit out?".
Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.
#include <stdio.h>
const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }
void print_english( int const n )
{ printf( "%s\n", english( n )); }
int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }
That breaks two rules:
- instructions conditioned by 'if' should have braces,
I suppose you don't mean
if (n == value) { result = string; }
else { result = other; }
which I'd think doesn't change anything. - So what is it?
Actually, you should just add explicit 'else' to fix the problem.
(Here there's no need to fiddle with spurious braces, I'd say.)
Lack of braces is a smokescreen hiding the second problem.
Or to put if differently, due to lack of braces the code
immediately smells bad.
- when we have the result we should return it immediately.
This would suffice to fix it, wouldn't it?
Yes (but see above).
Once those are fixed code works as expected...
I find this answer - not wrong, but - problematic for two reasons.
There's no accepted "general rules" that could get "broken"; it's
just rules that serve in given languages and application contexts.
And they may conflict with other "rules" that have been set up to
streamline code, make it safer, or whatever.
No general rules, yes. But every sane programmer has _some_ rules.
My point was that if you adopt resonable rules, then whole classes
of potential problems go away.
On 30/11/2024 05:25, Tim Rentsch wrote:[...]
Michael S <already5chosen@yahoo.com> writes:
I remember having much shorter file (core of 3rd-party TCP protocol
implementation) where compilation with gcc took several seconds.
Looked at it now - only 22 Klocs.
Text size in .o - 34KB.
Compilation time on much newer computer than the one I remembered, with
good SATA SSD and 4 GHz Intel Haswell CPU - a little over 1 sec. That
with gcc 4.7.3. I would guess that if I try gcc13 it would be 1.5 to 2
times longer.
So, in terms of Klock/sec it seems to me that time reported by Bart
is not outrageous. Indeed, gcc is very slow when compiling any source
several times above average size.
In this particular case I can not compare gcc to alternative, because
for a given target (Altera Nios2) there are no alternatives.
I'm not disputing his ratios on compilation speeds. I implicitly
agreed to them in my earlier remarks. The point is that the
absolute times are so small that most people don't care. For
some reason I can't fathom Bart does care, and apparently cannot
understand why most other people do not care. My conclusion is
that Bart is either quite immature or a narcissist. I have tried
to explain to him why other people think differently than he does,
but it seems he isn't really interested in having it explained.
Oh well, not my problem.
EVERYBODY cares about compilation speeds. [...]
Bart <bc@freeuk.com> writes:
On 30/11/2024 05:25, Tim Rentsch wrote:
EVERYBODY cares about compilation speeds. [...]
No, they don't. I accept that you care about compiler speed. What
most people care about is not speed but compilation times, and as
long as the times are small enough they don't worry about it.
Another difference may be relevant here. Based on other comments of
yours I have the impression that you frequently invoke compilations interactively. A lot of people never do that (or do it only very
rarely). In a project I am working on now I do builds often,
including full builds where every .c file is recompiled. But all
the compilation times together are only a small fraction of the
total, because doing a build includes lots of other steps, including
running regression tests. Even if the total compilation time were
zero the build process wouldn't be appreciably shorter.
I understand that you care about compiler speed, and that's fine
with me; more power to you. Why do you find it so hard to accept
that lots of other people have different views than you do, and
those people are not all stupid?
Do you really consider yourself
the only smart person in the room?
On 02/12/2024 14:09, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
On 30/11/2024 05:25, Tim Rentsch wrote:
EVERYBODY cares about compilation speeds. [...]
No, they don't. I accept that you care about compiler speed. What
most people care about is not speed but compilation times, and as
long as the times are small enough they don't worry about it.
Another difference may be relevant here. Based on other comments of
yours I have the impression that you frequently invoke compilations
interactively. A lot of people never do that (or do it only very
rarely). In a project I am working on now I do builds often,
including full builds where every .c file is recompiled. But all
the compilation times together are only a small fraction of the
total, because doing a build includes lots of other steps, including
running regression tests. Even if the total compilation time were
zero the build process wouldn't be appreciably shorter.
But it might be appreciably longer if the compilers you used were a lot slower! Or needed to be invoked more. Then even you might start to care
about it.
You don't care because in your case it is not the bottleneck, and enough
work has been put into those compilers to ensure they are not even slower.
(I don't know why regression tests need to feature in every single build.)
I understand that you care about compiler speed, and that's fine
with me; more power to you. Why do you find it so hard to accept
that lots of other people have different views than you do, and
those people are not all stupid?
You might also accept that for many, compilation /is/ a bottleneck in
their work, or at least it introduces an annoying delay.
Or are you suggesting that the scenario portrayed here:
https://xkcd.com/303/
is a complete fantasy?
Do you really consider yourself
the only smart person in the room?
Perhaps the most impatient.
On 02.12.2024 15:44, Bart wrote:
If all you want is to _sequentially_ process each single error in
a source file you don't need a test; all you need is to get the
error message, to start the editor, edit, and reiterate the compile
(to get the next error message, and so on). - Very time consuming.
But as soon as the errors are [all] fixed in a module... - what
do you do with it? - ...you should test that what you've changed
or implemented has been done correctly.
So edit/compile-iterating a single source is more time-consuming
than fixing it in, let's call it, "batch-mode". And once it's
error-free the compile times are negligible in the whole process.
Or are you suggesting that the scenario portrayed here:
https://xkcd.com/303/
is a complete fantasy?
It is a comic. - So, yes, it's fantasy. It's worth a scribbling
on a WC wall but not suited as a sensible base for discussions.
On 30.11.2024 05:40, Tim Rentsch wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 30.11.2024 00:29, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
On 28/11/2024 17:28, Janis Papanagnou wrote:
But we're speaking about compilation times. [...]
You can make a similar argument about turning on the light switch
when entering a room. Flicking light switches is not something you
need to do every few seconds, but if the light took 5 seconds to
come on (or even one second), it would be incredibly annoying.
This analogy sounds like something a defense attorney would say who
has a client that everyone knows is guilty.
Intentionally or not; it's funny to respond to an analogy with an
analogy. :-}
My statement was not an analogy. Similar is not the same as
analogous.
It's of course (and obviously) not the same; it's just a
similar term where the semantics of both terms have an overlap.
(Not sure why you even bothered to reply and nit-pick here.
But with your habit you seem to have just missed the point;
the comparison of your reply-type with Bart's argumentation.)
On Wed, 20 Nov 2024 12:31:35 -0000 (UTC), Dan Purgert wrote:
On 2024-11-16, Stefan Ram wrote:
Dan Purgert <dan@djph.net> wrote or quoted:
if (n==0) { printf ("n: %u\n",n); n++;}
if (n==1) { printf ("n: %u\n",n); n++;}
if (n==2) { printf ("n: %u\n",n); n++;}
if (n==3) { printf ("n: %u\n",n); n++;}
if (n==4) { printf ("n: %u\n",n); n++;}
printf ("all if completed, n=%u\n",n);
above should be equivalent to this
for(;n>=0&&n<5;++n) printf ("n: %u\n",n);
printf ("all if completed, n=%u\n",n);
Well, it's a blue moon when someone nails it. Most of them fall
for my little gotcha hook, line, and sinker.
#include <stdio.h>
const char * english( int const n )
{ const char * result;
if( n == 0 )result = "zero";
if( n == 1 )result = "one";
if( n == 2 )result = "two";
if( n == 3 )result = "three";
else result = "four";
return result; }
void print_english( int const n )
{ printf( "%s\n", english( n )); }
int main( void )
{ print_english( 0 );
print_english( 1 );
print_english( 2 );
print_english( 3 );
print_english( 4 ); }
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 30.11.2024 05:40, Tim Rentsch wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 30.11.2024 00:29, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
On 28/11/2024 17:28, Janis Papanagnou wrote:
But we're speaking about compilation times. [...]
You can make a similar argument about turning on the light switch
when entering a room. Flicking light switches is not something you >>>>>> need to do every few seconds, but if the light took 5 seconds to
come on (or even one second), it would be incredibly annoying.
This analogy sounds like something a defense attorney would say who
has a client that everyone knows is guilty.
Intentionally or not; it's funny to respond to an analogy with an
analogy. :-}
My statement was not an analogy. Similar is not the same as
analogous.
It's of course (and obviously) not the same; it's just a
similar term where the semantics of both terms have an overlap.
(Not sure why you even bothered to reply and nit-pick here.
It's because you thought it was just a nit-pick that I bothered
to reply.
But with your habit you seem to have just missed the point;
the comparison of your reply-type with Bart's argumentation.)
If you think they are the same then it is you who has missed the
point.
On 02/12/2024 18:19, Janis Papanagnou wrote:
On 02.12.2024 15:44, Bart wrote:
If all you want is to _sequentially_ process each single error in
a source file you don't need a test; all you need is to get the
error message, to start the editor, edit, and reiterate the compile
(to get the next error message, and so on). - Very time consuming.
But as soon as the errors are [all] fixed in a module... - what
do you do with it? - ...you should test that what you've changed
or implemented has been done correctly.
So edit/compile-iterating a single source is more time-consuming
than fixing it in, let's call it, "batch-mode". And once it's
error-free the compile times are negligible in the whole process.
I've struggled to find a suitable real-life analogy.
All I can suggest is that people have gone to some lengths to justify
having a car that can only travel at 3 mph around town, rather then 30
mph (ie 5 vs 50 kph).
Maybe their town is only a village, so the net difference is neglible.
Or they rarely drive, or avoid doing so, another way to downplay the inconvenience of such slow wheels.
The fact is that driving at 3 mph on a clear road is incredibly
frustrating even when you're not in a hurry to get anywhere!
[...]
On 02.12.2024 19:48, Bart wrote:[...]
All I can suggest is that people have gone to some lengths to justify
having a car that can only travel at 3 mph around town, rather then 30
mph (ie 5 vs 50 kph).
(You certainly meant km/h.)
On 02/12/2024 14:09, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
On 30/11/2024 05:25, Tim Rentsch wrote:
EVERYBODY cares about compilation speeds. [...]
No, they don't. I accept that you care about compiler speed.
What most people care about is not speed but compilation times,
and as long as the times are small enough they don't worry about
it.
Another difference may be relevant here. Based on other comments
of yours I have the impression that you frequently invoke
compilations interactively. A lot of people never do that (or do
it only very rarely). In a project I am working on now I do
builds often, including full builds where every .c file is
recompiled. But all the compilation times together are only a
small fraction of the total, because doing a build includes lots
of other steps, including running regression tests. Even if the
total compilation time were zero the build process wouldn't be
appreciably shorter.
But it might be appreciably longer if the compilers you used were
a lot slower! Or needed to be invoked more. [...]
On 01/12/2024 13:04, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:Difficult bugs always occur in larger codebases, but with C, these in a language that I can't navigate, and for programs which are not mine, and which tend to be badly written, bristling with typedefs and macros.
On 28/11/2024 12:37, Michael S wrote:
On Wed, 27 Nov 2024 21:18:09 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38
Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed.
My impression was that Bart is talking about machine-generated code.
For machine generated code 250Kloc is not too much.
This file mostly comprises sqlite3.c which is a machine-generated
amalgamation of some 100 actual C files.
You wouldn't normally do development with that version, but in my
scenario, where I was trying to find out why the version built with my
compiler was buggy, I might try adding debug info to it then building
with a working compiler (eg. gcc) to compare with.
Even in context of developing a compiler I would not run blindly
many compiliations of large file.
It could take a week to track down where the error might be ...
At first stage I would debug
compiled program, to find out what is wrong with it.
... within the C program. Except there's nothing wrong with the C
program! It works fine with a working compiler.
The problem will be in the generated code, so in an entirely different program.
So normal debugging tools are useful when several sets of
source code are in involved, in different languages, or the error occurs
in the second generation version of either the self-hosted tool, or the program under test if it is to do with languages.
(For example, I got tcc.c working at one point. My generated tcc.exe
could compile tcc.c, but that second-generation tcc.c didn't work.)
After that I would try to minimize the testcase, removing code which
do not contribute to the bug.
Again, there is nothing wrong with the C program, but in the code
generated for it. The bug can be very subtle, but it usually turns out
to be something silly.
Removing code from 10s of 1000s of lines (or 250Kloc for sql) is not practical. But yet, the aim is to isolate some code which can be used to recreate the issue in a smaller program.
Debugging can involve comparing two versions, one working, the other
not, looking for differences. And here there may be tracking statements added.
If the only working version is via gcc, then that's bad news because it makes the process even more of a PITA.
I added an interpreter mode to my IL, because I assume that would give a solid, reliable reference implementation to compare against.
If turned out to be even more buggy than the generated native code!
(One problem was to do with my stdarg.h header which implements VARARGS
used in function definitions. It assumes the stack grows downwords.
In
my interpreter, it grows downwards!)
That involves severla compilations
of files with quickly decreasing sizes.
Tim isn't asking the right questions (or any questions!). WHY does gcc
take so long to generate indifferent code when the task can clearly be
done at least a magnitude faster?
The simple answer is: users tolerate long compile time. If users
abandoned 'gcc' to some other compiler due to long compile time,
then 'gcc' developers would notice.
People use gcc. They come to depend on its features, or they might use (perhaps unknowingly) some extensions. On Windows, gcc includes some
headers and libraries that belong to Linux, but other compilers don't provide them.
The result is that if they were to switch to a smaller, faster compiler, their program may not work.
They'd have to use it from the start. But then they may want to use libraries which only work with gcc ...
You need to improve your propaganda for faster C compilers...
I actually don't know why I care. I get the benefit of my fast tools
every day; they're a joy to use. So I'm not bothered that other people
are that tolerant of slow, cumbersome build systems.
But then, people in this group do like to belittle small, fast products
(tcc for example as well as my stuff), and that's where it gets annoying.
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 02.12.2024 19:48, Bart wrote:[...]
All I can suggest is that people have gone to some lengths to justify
having a car that can only travel at 3 mph around town, rather then 30
mph (ie 5 vs 50 kph).
(You certainly meant km/h.)
Both "kph" and "km/h" are common abbreviations for "kilometers per
hour". Were you not familiar with "kph"?
Bart <bc@freeuk.com> wrote:
(For example, I got tcc.c working at one point. My generated tcc.exe
could compile tcc.c, but that second-generation tcc.c didn't work.)
Clear, you work in stages: first you find out what is wrong with second-generation tcc.exe.
In
my interpreter, it grows downwards!)
You probably meant upwards?
And handling such things is natural
when you have portablity in mind, either you parametrise stdarg.h
so that it works for both stack directions, or you make sure that
interpreter and compiler use the same direction (the later seem to
be much easier).
Actually, I think that most natural way is to
have data structure layout in the interpreter to be as close as
possible to compiler data layout.
They'd have to use it from the start. But then they may want to use
libraries which only work with gcc ...
Well, you see that there are reasons to use 'gcc'.
Next version was cross-compiled on Linux using gcc. This version
used inline assembly for rounding and was significantly faster
than what Borland C produced. Note: images to process were
largish (think of say 12000 by 20000 pixels) and speed was
important factor. So using 'gcc' specific code was IMO justified
(this code was used conditionally, other compilers would get
slow portable version using 'floor').
You need to improve your propaganda for faster C compilers...
I actually don't know why I care. I get the benefit of my fast tools
every day; they're a joy to use. So I'm not bothered that other people
are that tolerant of slow, cumbersome build systems.
But then, people in this group do like to belittle small, fast products
(tcc for example as well as my stuff), and that's where it gets annoying.
I tried tcc compiling TeX. Long ago it did not work due to limitations
of tcc. This time it worked. Small comparison on main file (19062
lines):
Command time size code size data
tcc -g 0.017 290521 1188
tcc 0.015 290521 1188
gcc -O0 -g 0.440 248467 14
gcc -O0 0.413 248467 14
Sysop: | Tetrazocine |
---|---|
Location: | Melbourne, VIC, Australia |
Users: | 4 |
Nodes: | 8 (0 / 8) |
Uptime: | 59:45:16 |
Calls: | 65 |
Files: | 21,500 |
Messages: | 73,572 |