On 2025-11-18 20:04, Johnny Billquist wrote:
On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:
Lack of utf-8 would be an issue for some things, but mostly not.
Without UTF-8, you could not have ??? or ??? or ?ñ? or those curly quotes.
Of course you could.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).
But with the transmission you have to transmit first what charset you
are going to use, and then you are limited by it, and the recipient must
have the same map, and be able to use it. Perhaps he has to use his own
map instead.
ISO 8859-1 ("Latin 1") is a special case. No mapping table is required...
for conversion to Unicode, because all ISO 8859-1 codepoints have 1:1 mappings to Unicode codepoints. This means any UTF can be directly
applied to ISO 8859-1 codepoints.
The MIME declaration "ISO-8859-1" includes CO and C1 control characters.
charset="utf-ebcdic" is an encoding using variable lengths for all of
the codepoints in the Unicode character set. In UTF-EBCDIC an encoding
very similar to UTF-8 encodes Unicode codepoints five bits at a time
into EBCDIC. Codepoints that are under 160 are encoded in a single octet
and codepoints above 159 are encoded in multiple octets all with the
highbit set. Only the C1 control chacters are native highbit set EBCDIC.
On 11/19/25 19:09, Eli the Bearded wrote:
charset="utf-ebcdic" is an encoding using variable lengths for all of
the codepoints in the Unicode character set. In UTF-EBCDIC an
encoding very similar to UTF-8 encodes Unicode codepoints five bits
at a time into EBCDIC. Codepoints that are under 160 are encoded in a
single octet and codepoints above 159 are encoded in multiple octets
all with the highbit set. Only the C1 control chacters are native
highbit set EBCDIC.
That sounds like a particularly bad choice. above 159 includes
lowercase s-z, all uppercase, and all numerics. Under 160 are only
lowercase a-r and specials. Personally I'd have chosen 128 and above
as single bytes, possibly biased (i.e all alphabetics and numerics),
and 0-127 as multiple bytes (special characters).
There are no good choices involving ECBDIC.
On 20/11/2025 08:47, Richard Kettlewell wrote:
There are no good choices involving ECBDIC.
ROFLMAO....
On 2025-11-21, St‚phane CARPENTIER <sc@fiat-linux.fr> wrote:
Le 18-11-2025, Eli the Bearded <*@eli.users.panix.com> a ‚critÿ:
In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
Without UTF-8, you could not have ??? or ??? or ?ñ? or those curly quotes.Of course you could.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).
The Latin-1 I know does not have a Euro symbol. It does have the generic >>> currency placeholder at 0xA5: ?
They created the latin9 from the latin1 to add this ? symbol.
I thought that was Latin-15.
Niklas
You don?t have to go very far from there to find ones that were a little
harder to deal with ...
It amazes me that computers can handle Chinese. Not only display, but >keyboards.
According to Carlos E.R. <robin_listas@es.invalid>:
You don?t have to go very far from there to find ones that were a little >>> harder to deal with ...
It amazes me that computers can handle Chinese. Not only display, but
keyboards.
Actually, there aren't Chinese keyboards. While there were some impressive attempts at electromechanical Chinese typewriters in the 20th c., these days the way one types Chinese is to type the pinyin transliteration and the
input software figures out the characters. When there are multiple characters
with the same pinyin it can usually tell from context which one makes sense, or if need be it'll pop up a question box and the user picks the correct one.
Japanese has two phonetic alphabets, hiragana amd katakana, so that's
what people type, with a similar scheme turning them into kanji
characters.
Displaying Chinese and Japanese is relatively straightforward sincethere are Unicode code points for all of the characters that are in
common use, known as the CJK Unified Ideographs. But Chinese has a lot
of obscure rarely used characters and there is a huge backlog of them
still proposed to be added to Unicode.
If you are interested in this topic, read this excellent book:
https://en.wikipedia.org/wiki/Kingdom_of_Characters
Originally Japanese was written in Chinese but the pronouciation
changed.
Japanese has two phonetic alphabets, hiragana amd katakana, so that's
what people type, with a similar scheme turning them into kanji
characters.
Yes but the 2000 Kanji are essential to be considered literate.
To add >to the fun the
kanji may be used it various ways to indicate the desired pronounciation
and whether a word is an adaptation of a word not found in Japanese language and
these are shown as superscripts set above the first letter.
| Sysop: | Tetrazocine |
|---|---|
| Location: | Melbourne, VIC, Australia |
| Users: | 14 |
| Nodes: | 8 (0 / 8) |
| Uptime: | 216:22:38 |
| Calls: | 184 |
| Files: | 21,502 |
| Messages: | 82,076 |