I am not 100% sure, but I believe some people (Greeks?)
have keyboards such that their native character set can be
freely entered, when they're working in their native language.
And if they are required to work in English, or rather, 7-bit
ASCII, they will "switch keyboards", ie using the mouse or
whatever to select a different keyboard, and type the English,
and then return to the Greek etc keyboard.
I'm interested in a slight change to C90. I'm not interested in
UTF-8 either.
I'd like to write a program using pure ASCII, and indeed, pure
English prompts, but not force a Greek user to switch keyboards.
I'm not interested in a complicated translation layer either.
Originally I was thinking I just need to modify my programs and
the Greek locale so that I could do:
if (toupper(c) == 'X') printf("whatever\n");
And make some random Greek character the equivalent of 'X', ie
the Greek user knows that when prompted to type 'x' (or 'X'), he
just needs to press (lambda or whatever Greeks use). The Greek
locale will convert lambda into X when passed to toupper.
However, it was pointed out to me that this would interfere with
storing filenames on traditional FAT, for example. Not everything
should be subject to uppercasing. The Greek, or Katakana, should
be preserved, not converted into ASCII gibberish.
So I was thinking I need some halfway point of equivalency.
I'm happy to change all my programs so that they don't rely on
the user typing in an exact character. ie I am happy to drop case
sensitivity from everything, "now that I know there's an issue".
Actually there are other environments where case sensitivity is
difficult. e.g. some CMS (mainframe) environments.
And making sure I do toupper() is a way to solve the issue for
the environments where case-sensitivity is difficult/impossible.
(assuming they exist).
But I'd like to go one step further and cater for Greeks etc.
And it seems to me that I want to not change toupper() - which
would be expected to uppercase Greek characters (or some
other language), independent of the uppercasing of any English
characters they happened to enter (potentially at "great effort"
of changing keyboards).
And what I'm really after is being able to designate some Greek
characters as the equivalent of English counterparts in circumstances
where that is appropriate, and there is a desire to avoid a keyboard
change. So a new isequiv() function as an extension to C90. (I'm
basically forking C90 to create a C90+ or C90.0.1 - same as we
do with software - bells and whistles go into C99 etc, not C90.0.1)
Any thoughts?
Thanks. Paul..
a
I am not 100% sure, but I believe some people (Greeks?)
have keyboards such that their native character set can be
freely entered, when they're working in their native language.
And if they are required to work in English, or rather, 7-bit
ASCII, they will "switch keyboards", ie using the mouse or
whatever to select a different keyboard, and type the English,
and then return to the Greek etc keyboard.
I'm interested in a slight change to C90. I'm not interested in
UTF-8 either.
I'd like to write a program using pure ASCII, and indeed, pure
English prompts, but not force a Greek user to switch keyboards.
I'm not interested in a complicated translation layer either.
Originally I was thinking I just need to modify my programs and
the Greek locale so that I could do:
if (toupper(c) == 'X') printf("whatever\n");
And make some random Greek character the equivalent of 'X', ie
the Greek user knows that when prompted to type 'x' (or 'X'), he
just needs to press (lambda or whatever Greeks use). The Greek
locale will convert lambda into X when passed to toupper.
However, it was pointed out to me that this would interfere with
storing filenames on traditional FAT, for example. Not everything
should be subject to uppercasing. The Greek, or Katakana, should
be preserved, not converted into ASCII gibberish.
So I was thinking I need some halfway point of equivalency.
I'm happy to change all my programs so that they don't rely on
the user typing in an exact character. ie I am happy to drop case
sensitivity from everything, "now that I know there's an issue".
Actually there are other environments where case sensitivity is
difficult. e.g. some CMS (mainframe) environments.
And making sure I do toupper() is a way to solve the issue for
the environments where case-sensitivity is difficult/impossible.
(assuming they exist).
But I'd like to go one step further and cater for Greeks etc.
And it seems to me that I want to not change toupper() - which
would be expected to uppercase Greek characters (or some
other language), independent of the uppercasing of any English
characters they happened to enter (potentially at "great effort"
of changing keyboards).
And what I'm really after is being able to designate some Greek
characters as the equivalent of English counterparts in circumstances
where that is appropriate,
and there is a desire to avoid a keyboard
change. So a new isequiv() function as an extension to C90. (I'm
basically forking C90 to create a C90+ or C90.0.1 - same as we
do with software - bells and whistles go into C99 etc, not C90.0.1)
Any thoughts?
Thanks. Paul..
a
It is not at all clear to me what you are asking about here. Indeed, I
do not think it is at all clear to /you/. Have you tried talking
directly to someone who regularly uses a different alphabet - Greek, Cyrillic, Hebrew, etc.? Try to explain your idea to them and see what
they think.
At the moment it appears that you want to make "toupper" (or some new function) somehow take Greek letters (without using UTF-8) and turn some Greek lower-case letters into some Latin upper-case letters. But it
should only do that sometimes - not when typing filenames for FAT.
So which Greek letter to you think should be "equivalent" to Latin X ?
Perhaps xi, since it has similar pronunciation to Latin X (in English)?
Perhaps chi, since its capital looks very much like a Latin X, though it
has a different pronunciation and the lower-case is noticeably
different. And which Greek letter should be "equivalent" to Q, J, or V?
What should the Greek letters omega and psi convert to?
And what are you going to do with Cyrillic alphabets (in all their varieties), and Hebrew, Arabic, or the many dozens of alphabets used in
India and south-east Asia?
On 11/15/25 10:33, Paul Edwards wrote:
I am not 100% sure, but I believe some people (Greeks?)
have keyboards such that their native character set can be
freely entered, when they're working in their native language.
In Greece you will typically get keyboards with the Greek letters.
And if they are required to work in English, or rather, 7-bit
ASCII, they will "switch keyboards", ie using the mouse or
whatever to select a different keyboard, and type the English,
and then return to the Greek etc keyboard.
I had once configured a system to use some control-key combination
(like Ctrl-Alt-Shift) to switch between three different languages
(EN, GR, DE).
I'm interested in a slight change to C90. I'm not interested in
UTF-8 either.
You have to map the keys to characters of some specific "codepage".
It sounds to me that you want with an interactive keyboard-layout
change also to switch the underlying character encoding.
- To me
that just sounds wrong! - How would a string like "P??" be then
encoded?
The environment that I set up just used UTF-8, a single encoding
for all (in that case just three) languages. That way you could
type (Greek) '?' or (German) '?' or any other character (as far
as it's supported by the system with fonts, etc.).
I'd like to write a program using pure ASCII, and indeed, pure
English prompts, but not force a Greek user to switch keyboards.
I understand it that the "C"-code is as usual ASCII but embedded
strings may be any other character.
Again: How would a string like "P??" be then encoded?
The '?' (like an '?') could stem from ISO 8859-15 (but then it would
be a special case), or from ISO 8859-7 (the native Greek variant of
Latin), or from UTF-8. - You cannot represent these characters by a
single ASCII-character.
I'm not interested in a complicated translation layer either.
What comes below sounds very fuzzy; I certainly don't understand what
you have in mind there, so I cannot really comment on that.
For me, the solution for multi-language programming environment would
not switch character encodings but use a single standard (UTF-8) for
that.
Originally I was thinking I just need to modify my programs and
the Greek locale so that I could do:
if (toupper(c) == 'X') printf("whatever\n");
And make some random Greek character the equivalent of 'X', ie
the Greek user knows that when prompted to type 'x' (or 'X'), he
just needs to press (lambda or whatever Greeks use). The Greek
locale will convert lambda into X when passed to toupper.
Are you looking for an ASCII representation of that (template?) 'X'? Something like "μ" (Like "µ" for '?' in HTML)?
However, it was pointed out to me that this would interfere with
storing filenames on traditional FAT, for example. Not everything
should be subject to uppercasing. The Greek, or Katakana, should
be preserved, not converted into ASCII gibberish.
You should be aware that on filename level you typically have (on
Unixes) just anonymous octets that need an interpretation to be
displayed. (It may be UCS2 with Windows filesystems; don't know.)
In my Linux/UTF-8 environment my filenames may contain umlauts or
Greek letters
$ touch "P??"
$ ls "P??"
P??
The filename will be stored in octets (values 0..255), where each
non-ASCII character will occupy more than one octet.
Such filenames will only be displayed as "ASCII gibberish" if you
somehow "force" it to be interpreted as pure ASCII.
So I was thinking I need some halfway point of equivalency.
I'm happy to change all my programs so that they don't rely on
the user typing in an exact character. ie I am happy to drop case sensitivity from everything, "now that I know there's an issue".
Actually there are other environments where case sensitivity is
difficult. e.g. some CMS (mainframe) environments.
And making sure I do toupper() is a way to solve the issue for
the environments where case-sensitivity is difficult/impossible.
(assuming they exist).
Usually this should be handled by the locale setting.
$ awk 'BEGIN {print tolower("?")}'
?
$ awk 'BEGIN {print toupper("?")}'
?
The test above is from my environment (using UTF-8); it works even
*without* setting any Greek locale (I'm using "en_US.UTF-8".)
But I'd like to go one step further and cater for Greeks etc.
And it seems to me that I want to not change toupper() - which
would be expected to uppercase Greek characters (or some
other language), independent of the uppercasing of any English
characters they happened to enter (potentially at "great effort"
of changing keyboards).
And what I'm really after is being able to designate some Greek
characters as the equivalent of English counterparts in circumstances
where that is appropriate,
To me this sounds as fuzzy as wrong.
and there is a desire to avoid a keyboard
change. So a new isequiv() function as an extension to C90. (I'm
basically forking C90 to create a C90+ or C90.0.1 - same as we
do with software - bells and whistles go into C99 etc, not C90.0.1)
Any thoughts?
I cannot comment on your reluctance to use (or requirement to not use)
an underlying UTF-8 encoding, and the rest should be handled by the
locale - if UTF-8 characters are supported and allowed in string and character literals of the respective programming language; I haven't
tried for "C" (but I seem to have no problems with using any UTF-8
characters in string literals).
I hope you found some useful information and got some more insights.
If, based on that, you can clarify your intentions and thoughts I'd
be interested to hear what you're actually trying to achieve.
I am not 100% sure, but I believe some people (Greeks?)
have keyboards such that their native character set can be
freely entered, when they're working in their native language.
And if they are required to work in English, or rather, 7-bit
ASCII, they will "switch keyboards", ie using the mouse or
whatever to select a different keyboard, and type the English,
and then return to the Greek etc keyboard.
I'm interested in a slight change to C90. I'm not interested in
UTF-8 either.
I'd like to write a program using pure ASCII, and indeed, pure
English prompts, but not force a Greek user to switch keyboards.
I'm not interested in a complicated translation layer either.
Originally I was thinking I just need to modify my programs and
the Greek locale so that I could do:
if (toupper(c) == 'X') printf("whatever\n");
And make some random Greek character the equivalent of 'X', ie
the Greek user knows that when prompted to type 'x' (or 'X'), he
just needs to press (lambda or whatever Greeks use). The Greek
locale will convert lambda into X when passed to toupper.
However, it was pointed out to me that this would interfere with
storing filenames on traditional FAT, for example. Not everything
should be subject to uppercasing. The Greek, or Katakana, should
be preserved, not converted into ASCII gibberish.
So I was thinking I need some halfway point of equivalency.
"Janis Papanagnou" <janis_papanagnou+ng@hotmail.com> wrote in message news:10fepd8$3p5tk$4@dont-email.me...
On 11/15/25 10:33, Paul Edwards wrote:And assuming someone only knows Greek, I don't want them
......
to ever switch keyboards.
(I wouldn't want it if it was the other way around - and ASCII
was in fact all Greek).
Yes, any traditional Greek codepage is fine. Or even a......
non-traditional Greek codepage. I expect all the Greek
characters to exist between x'80' and x'FF'.
...
I don't expect the Greeks to learn English, but I do expect them
to get used to using the software such that they recognize that
when a particular bit of English gibberish like "Enter your name"
appears on the screen, that they know it is time to enter their
(Greek) name.
...
...As per 1980s. I'm not trying to change what you did in the
1980s. And I'm not trying to change the existing "accents".
ie you type ^ and then a "u" and then you get a u with a
hat. (in some countries).
...I don't want to use ASCII. Nothing a Greek-only person will
ever type will be in ASCII, it will all be a SINGLE value
between x'80' and x'FF'.
...It's not multi-language - well - not by preference. The end
user would much rather have the prompts for "what is your
name?" in Greek, but that's not on the table.
Originally I was thinking I just need to modify my programs and
the Greek locale so that I could do:
if (toupper(c) == 'X') printf("whatever\n");
And make some random Greek character the equivalent of 'X', ie
the Greek user knows that when prompted to type 'x' (or 'X'), he
just needs to press (lambda or whatever Greeks use). The Greek
locale will convert lambda into X when passed to toupper.
Are you looking for an ASCII representation of that (template?) 'X'?
Something like "μ" (Like "µ" for 'æ' in HTML)?
Yes, an ASCII representation of similar-to-uppercase "micro".
...
...It won't be ASCII or UTF-8. It will be what MSDOS in
Greece did in the 1980s.
...
Can you touch-type? I assume all SBCS users can touch-type
in their language (if they learn). I expect someone who only
knows Greek, to touch-type at full speed - in Greek - even
when using my English-only software.
...
I'm interested in Greek, Greek, Greek.
At no extra processing or space cost to English.
Exactly as it was when the C90 standard was created.
Which came VERY close to supporting exactly that. The
C89 standard was delayed for a year because of Europeans
intending to change C89 to support locales. Instead, the
Americans changed ANSI. And got it NEARLY right.
You can't expect people to get things 100% right. So
with the benefit of 35 years of hindsight, and a C library
and operating system and supporting tools under my
belt, I'm trying to add some MINOR touches to C90.
Yes, an ASCII representation of similar-to-uppercase "micro".But I didn't understand what you wanted here, so I'll bite.
"David Brown" <david.brown@hesbynett.no> wrote in message news:10fekpt$mkk8$1@dont-email.me...
It is not at all clear to me what you are asking about here. Indeed, I
do not think it is at all clear to /you/. Have you tried talking
directly to someone who regularly uses a different alphabet - Greek,
Cyrillic, Hebrew, etc.? Try to explain your idea to them and see what
they think.
No. This IS my (apparently - next message) attempt to directly
talk to someone.
On 11/23/25 01:54, Paul Edwards wrote:
"Janis Papanagnou" <janis_papanagnou+ng@hotmail.com> wrote in message news:10fepd8$3p5tk$4@dont-email.me...
On 11/15/25 10:33, Paul Edwards wrote:And assuming someone only knows Greek, I don't want them
......
to ever switch keyboards.
(I had been speaking just about switching the layout, not the
keyboard. The installation I spoke of used one Greek keyboard
used with three switchable layouts, for EN, GR, DE.)
(In another environment I'm using two keyboards, EN and DE, in
parallel on one system, each one with its own specific layout.)
(I wouldn't want it if it was the other way around - and ASCII
was in fact all Greek).
Not sure what you're trying to say here. ASCII was never Greek,
and ISO 8859-7 contains ASCII as subset and Greek in the upper
range.
Yes, any traditional Greek codepage is fine. Or even a
non-traditional Greek codepage. I expect all the Greek
characters to exist between x'80' and x'FF'.
I don't know what you mean by "non-/traditional Greek codepage".
(Back these days there was no 7-bit Greek codepage that I knew of
- but I'm not sure - but there was 8-bit [ASCII based/"extended"]
ISO 8859-7 in the late 1980's.)
I don't expect the Greeks to learn English, but I do expect them
to get used to using the software such that they recognize that
when a particular bit of English gibberish like "Enter your name"
appears on the screen, that they know it is time to enter their
(Greek) name.
I see. - So something like: Enter your name: ???????
And the binary/encoded representation shall be, say, ISO 8859-7 ?
I'd have expected that old "C" versions supported 8-bit character
sets without any change; I cannot tell for GR, but I've certainly
used DE with ISO 8859-1 and -15 encoded data in "C" (without any
change of the language) that contained umlauts (for example).
(With gcc -std=c90 and LC_ALL=C I can at least create "?lfr??" in
a contemporary system. But that might not be an exact test case,
as you may have in mind.)
As per 1980s. I'm not trying to change what you did in the
1980s. And I'm not trying to change the existing "accents".
ie you type ^ and then a "u" and then you get a u with a
hat. (in some countries).
(But that is just a configured way of how diacritical characters
can be entered through keyboard. In my systems I can set it with
the locale; using or not using "dead-characters", or some such.)
I don't want to use ASCII. Nothing a Greek-only person will
ever type will be in ASCII, it will all be a SINGLE value
between x'80' and x'FF'.
Yes, with ISO Latin 7 you'll have 8 bit available (as opposed to
7-bit ASCII). So if your system is supporting an 8-bit ISO 8859-7
character set that should probably work already; doesn't it?
It's not multi-language - well - not by preference. The end
user would much rather have the prompts for "what is your
name?" in Greek, but that's not on the table.
(Using a character encoding that supports two complete languages
is "multi-language", actually.)
Originally I was thinking I just need to modify my programs and
the Greek locale so that I could do:
if (toupper(c) == 'X') printf("whatever\n");
And make some random Greek character the equivalent of 'X', ie
the Greek user knows that when prompted to type 'x' (or 'X'), he
just needs to press (lambda or whatever Greeks use). The Greek
locale will convert lambda into X when passed to toupper.
Are you looking for an ASCII representation of that (template?) 'X'?
Something like "μ" (Like "µ" for '?' in HTML)?
Yes, an ASCII representation of similar-to-uppercase "micro".
It's still unclear to me. - Above I got the impression you'd want
Enter your name: ???????
now it reads as if you want just a (7-bit) ASCII encoding, as in
Enter your name: Giannis
In both cases upper-casing should be possible, though.
(I still don't see where you actually see or have the problem.)
It won't be ASCII or UTF-8. It will be what MSDOS in
Greece did in the 1980s.
I cannot tell anything about DOS in Greek in the 1980's. As far as
memory serves, I know that (DOS based) Windows 95 could be used in
Greece with Greek keyboards and letters. (It should be possible to
Google what codepage Windows used back these days. I'm positive
that MS did not do any rocket-science back then in that respect.)
Can you touch-type? I assume all SBCS users can touch-type
in their language (if they learn). I expect someone who only
knows Greek, to touch-type at full speed - in Greek - even
when using my English-only software.
Yes. - And...? (Not sure what that has to do with the requirements.)
I'm interested in Greek, Greek, Greek.
At no extra processing or space cost to English.
Exactly as it was when the C90 standard was created.
Which came VERY close to supporting exactly that. The
C89 standard was delayed for a year because of Europeans
intending to change C89 to support locales. Instead, the
Americans changed ANSI. And got it NEARLY right.
You can't expect people to get things 100% right. So
with the benefit of 35 years of hindsight, and a C library
and operating system and supporting tools under my
belt, I'm trying to add some MINOR touches to C90.
I cannot test; my expectation is that you can do what's described
above already with a Greek Latin 7 encoding natively in C90 as was
(most probably) done in Greece back in the late 1980's.
One thing *might* be crucial where you said:
Yes, an ASCII representation of similar-to-uppercase "micro".
But I didn't understand what you wanted here, so I'll bite.
| Sysop: | Tetrazocine |
|---|---|
| Location: | Melbourne, VIC, Australia |
| Users: | 14 |
| Nodes: | 8 (0 / 8) |
| Uptime: | 238:03:24 |
| Calls: | 184 |
| Files: | 21,502 |
| Messages: | 82,427 |