• ISO 8859-1 ("Latin 1") (was: Recent history of vi)

    From Michael Bäuerle@3:633/10 to All on Wed Nov 19 14:58:00 2025
    Carlos E.R. wrote:
    On 2025-11-18 20:04, Johnny Billquist wrote:
    On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
    On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:

    Lack of utf-8 would be an issue for some things, but mostly not.

    Without UTF-8, you could not have ??? or ??? or ?ñ? or those curly quotes.

    Of course you could.
    They exist just fine in Latin-1 (hmm, maybe not the quotes...).

    As noted by others in this thread, ??? is not available with it.

    But with the transmission you have to transmit first what charset you
    are going to use, and then you are limited by it, and the recipient must
    have the same map, and be able to use it. Perhaps he has to use his own
    map instead.

    ISO 8859-1 ("Latin 1") is a special case. No mapping table is required
    for conversion to Unicode, because all ISO 8859-1 codepoints have 1:1
    mappings to Unicode codepoints. This means any UTF can be directly
    applied to ISO 8859-1 codepoints.

    This means, for the characters from this thread, it is sufficient to
    look at their Unicode codepoints: +-----------+-------------------+-------------------------------------+
    | Character | Unicode codepoint | ISO 8859-1 codepoint (hexadecimal) | +-----------+-------------------+-------------------------------------+
    | ? | U+20AC | [not available] |
    | ? | U+00A9 | A9 |
    | ñ | U+00B1 | B1 | +-----------+-------------------+-------------------------------------+

    Any Unicode codepoint up to U+00FF is also present in ISO 8859-1 [1],
    or the C0 and C1 control characters [2], with the same value.

    The MIME declaration "ISO-8859-1" includes CO and C1 control characters.


    ______________
    [1] <https://en.wikipedia.org/wiki/ISO/IEC_8859-1>
    [2] <https://en.wikipedia.org/wiki/C0_and_C1_control_codes>

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Eli the Bearded@3:633/10 to All on Thu Nov 20 02:09:45 2025
    In comp.os.linux.misc, Michael B„uerle <michael.baeuerle@gmx.net> wrote:
    ISO 8859-1 ("Latin 1") is a special case. No mapping table is required
    for conversion to Unicode, because all ISO 8859-1 codepoints have 1:1 mappings to Unicode codepoints. This means any UTF can be directly
    applied to ISO 8859-1 codepoints.
    ...
    The MIME declaration "ISO-8859-1" includes CO and C1 control characters.

    Be technical. The MIME charset ISO-8859-1 includes the CO and C1
    control characters and has all of its characters at the same codepoints
    as Unicode but the character encoding is different from all Unicode
    character encodings.

    "charset" is a very specific term from MIME and it conflates character
    set with character encoding. In a world were all characters fit in
    eight bits, that's a very easy mistake to make, but since the MIME
    designers were aware of (and specifically working to accomodate) worlds
    where 8-bit encodings might not be used, that's was a poor choice.

    charset="utf-8" is an encoding using variable lengths for all of the
    codepoints in the Unicode character set. In UTF-8, codepoints that
    are under 128 are encoded in a single octet with the highbit unset. All codepoints over 127 are encoded in multiple octets all with the highbit
    set.

    charset="utf-7" is an encoding using variable lengths for many of the codepoints in the Unicode character set. In UTF-7 some characters are
    left as is, some characters (those above codepoint 65535) cannot be represented, and many characters are multibyte sequences. But
    critically, none of the bytes have the highbit set.

    charset="utf-ebcdic" is an encoding using variable lengths for all of
    the codepoints in the Unicode character set. In UTF-EBCDIC an encoding
    very similar to UTF-8 encodes Unicode codepoints five bits at a time
    into EBCDIC. Codepoints that are under 160 are encoded in a single octet
    and codepoints above 159 are encoded in multiple octets all with the
    highbit set. Only the C1 control chacters are native highbit set EBCDIC.

    Elijah
    ------
    here is the map to the map you want

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Peter Flass@3:633/10 to All on Wed Nov 19 20:16:42 2025
    On 11/19/25 19:09, Eli the Bearded wrote:

    charset="utf-ebcdic" is an encoding using variable lengths for all of
    the codepoints in the Unicode character set. In UTF-EBCDIC an encoding
    very similar to UTF-8 encodes Unicode codepoints five bits at a time
    into EBCDIC. Codepoints that are under 160 are encoded in a single octet
    and codepoints above 159 are encoded in multiple octets all with the
    highbit set. Only the C1 control chacters are native highbit set EBCDIC.


    That sounds like a particularly bad choice. above 159 includes lowercase
    s-z, all uppercase, and all numerics. Under 160 are only lowercase a-r
    and specials. Personally I'd have chosen 128 and above as single bytes, possibly biased (i.e all alphabetics and numerics), and 0-127 as
    multiple bytes (special characters).


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Richard Kettlewell@3:633/10 to All on Thu Nov 20 08:47:21 2025
    Peter Flass <Peter@Iron-Spring.com> writes:
    On 11/19/25 19:09, Eli the Bearded wrote:
    charset="utf-ebcdic" is an encoding using variable lengths for all of
    the codepoints in the Unicode character set. In UTF-EBCDIC an
    encoding very similar to UTF-8 encodes Unicode codepoints five bits
    at a time into EBCDIC. Codepoints that are under 160 are encoded in a
    single octet and codepoints above 159 are encoded in multiple octets
    all with the highbit set. Only the C1 control chacters are native
    highbit set EBCDIC.

    That sounds like a particularly bad choice. above 159 includes
    lowercase s-z, all uppercase, and all numerics. Under 160 are only
    lowercase a-r and specials. Personally I'd have chosen 128 and above
    as single bytes, possibly biased (i.e all alphabetics and numerics),
    and 0-127 as multiple bytes (special characters).

    There are no good choices involving ECBDIC.

    --
    https://www.greenend.org.uk/rjk/

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From The Natural Philosopher@3:633/10 to All on Thu Nov 20 11:10:29 2025
    On 20/11/2025 08:47, Richard Kettlewell wrote:

    There are no good choices involving ECBDIC.


    ROFLMAO....
    --
    "An intellectual is a person knowledgeable in one field who speaks out
    only in others...?

    Tom Wolfe


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Charlie Gibbs@3:633/10 to All on Thu Nov 20 17:57:31 2025
    On 2025-11-20, The Natural Philosopher <tnp@invalid.invalid> wrote:

    On 20/11/2025 08:47, Richard Kettlewell wrote:

    There are no good choices involving ECBDIC.

    ROFLMAO....

    Taken from Ted Nelson's _Computer Lib_:

    ASCII and ye shall receive.
    -- the computer industry

    ASCII not, what your machine can do for you.
    -- IBM

    A TA in one of my computer science classes pronounced EBCDIC as "ee-biddy-dick".

    --
    /~\ Charlie Gibbs | Growth for the sake of
    \ / <cgibbs@kltpzyxm.invalid> | growth is the ideology
    X I'm really at ac.dekanfrus | of the cancer cell.
    / \ if you read it the right way. | -- Edward Abbey

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Ralf Fassel@3:633/10 to All on Fri Nov 21 12:24:21 2025
    * Charlie Gibbs <cgibbs@kltpzyxm.invalid>
    | Taken from Ted Nelson's _Computer Lib_:

    | ASCII and ye shall receive.
    | -- the computer industry

    | ASCII not, what your machine can do for you.
    | -- IBM

    ASCII stupid question, get a stupid ANSI
    -- [from someones .sig]

    R'

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Nuno Silva@3:633/10 to All on Fri Nov 21 23:20:42 2025
    On 2025-11-21, Niklas Karlsson wrote:

    On 2025-11-21, St‚phane CARPENTIER <sc@fiat-linux.fr> wrote:
    Le 18-11-2025, Eli the Bearded <*@eli.users.panix.com> a ‚critÿ:
    In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
    On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
    Without UTF-8, you could not have ??? or ??? or ?ñ? or those curly quotes.
    Of course you could.
    They exist just fine in Latin-1 (hmm, maybe not the quotes...).

    The Latin-1 I know does not have a Euro symbol. It does have the generic >>> currency placeholder at 0xA5: ?

    They created the latin9 from the latin1 to add this ? symbol.

    I thought that was Latin-15.

    Niklas

    It seems it's both latin9 and iso8859-15:

    https://jkorpela.fi/latin9.html

    I was wondering why "latin15" didn't bring it up in some context the
    other day, I guess this is why?

    (On this system, I apparently can also open the online manual page for iso_8859-15 using the name "latin9".)

    --
    Nuno Silva

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From John Levine@3:633/10 to All on Mon Nov 24 01:45:52 2025
    According to Carlos E.R. <robin_listas@es.invalid>:
    You don?t have to go very far from there to find ones that were a little
    harder to deal with ...

    It amazes me that computers can handle Chinese. Not only display, but >keyboards.

    Actually, there aren't Chinese keyboards. While there were some impressive attempts at electromechanical Chinese typewriters in the 20th c., these days the way one types Chinese is to type the pinyin transliteration and the
    input software figures out the characters. When there are multiple characters with the same pinyin it can usually tell from context which one makes sense,
    or if need be it'll pop up a question box and the user picks the correct one.

    Japanese has two phonetic alphabets, hiragana amd katakana, so that's
    what people type, with a similar scheme turning them into kanji
    characters.

    Displaying Chinese and Japanese is relatively straightforward since
    there are Unicode code points for all of the characters that are in
    common use, known as the CJK Unified Ideographs. But Chinese has a lot
    of obscure rarely used characters and there is a huge backlog of them
    still proposed to be added to Unicode.

    If you are interested in this topic, read this excellent book:

    https://en.wikipedia.org/wiki/Kingdom_of_Characters




    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bobbie Sellers@3:633/10 to All on Sun Nov 23 18:06:20 2025


    On 11/23/25 17:45, John Levine wrote:
    According to Carlos E.R. <robin_listas@es.invalid>:
    You don?t have to go very far from there to find ones that were a little >>> harder to deal with ...

    It amazes me that computers can handle Chinese. Not only display, but
    keyboards.

    Actually, there aren't Chinese keyboards. While there were some impressive attempts at electromechanical Chinese typewriters in the 20th c., these days the way one types Chinese is to type the pinyin transliteration and the
    input software figures out the characters. When there are multiple characters
    with the same pinyin it can usually tell from context which one makes sense, or if need be it'll pop up a question box and the user picks the correct one.

    Japanese has two phonetic alphabets, hiragana amd katakana, so that's
    what people type, with a similar scheme turning them into kanji
    characters.

    Yes but the 2000 Kanji are essential to be considered literate. To add to the fun the
    kanji may be used it various ways to indicate the desired pronounciation
    and whether
    a word is an adaptation of a word not found in Japanese language and
    these are
    shown as superscripts set above the first letter. Originally Japanese
    was written in
    Chinese but the pronouciation changed. Then hiragana was invented and
    it became
    an item of artistic interest with some very difficult to read scripts
    being used in
    succeeding centuries and the schools of calligraphy.

    Displaying Chinese and Japanese is relatively straightforward since
    there are Unicode code points for all of the characters that are in
    common use, known as the CJK Unified Ideographs. But Chinese has a lot
    of obscure rarely used characters and there is a huge backlog of them
    still proposed to be added to Unicode.

    If you are interested in this topic, read this excellent book:

    https://en.wikipedia.org/wiki/Kingdom_of_Characters


    bliss

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Mon Nov 24 02:13:59 2025
    On Sun, 23 Nov 2025 18:06:20 -0800, Bobbie Sellers wrote:

    Originally Japanese was written in Chinese but the pronouciation
    changed.

    Japanese was an entirely different language, which adopted Chinese writing
    in lieu of having its own script. The Koreans and Vietnamese started out
    doing the same thing, but the Koreans invented their own syllabic-based
    script in the 13th century or so, and switched wholesale to that. The Vietnamese were colonized (for a while) by the French, who introduced a Roman-based rendition of the language, complete with funny squiggles here
    and there to denote tones of the tonal language, plus some other sound distinctions (e.g. ??? versus ?d?).

    I guess the only Koreans and Vietnamese who need to understand the old Chinese-based script for their respective languages would be those dealing with old historical documents.

    Meanwhile, the Japanese stuck with the Chinese script, only adding a few complications (like two different syllabic-based character sets, as well
    as the Roman alphabet) on top of that.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From John Levine@3:633/10 to All on Mon Nov 24 02:23:11 2025
    According to Bobbie Sellers <blissInSanFrancisco@mouse-potato.com>:
    Japanese has two phonetic alphabets, hiragana amd katakana, so that's
    what people type, with a similar scheme turning them into kanji
    characters.

    Yes but the 2000 Kanji are essential to be considered literate.

    Indeed, but the question was about how do you type Japanese, not how
    do you read it.

    To add >to the fun the
    kanji may be used it various ways to indicate the desired pronounciation
    and whether a word is an adaptation of a word not found in Japanese language and
    these are shown as superscripts set above the first letter.

    I don't know Japanese well enough to say how if at all one would type the superscripts.

    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)