• Re: Question: Do Winston's headers cause charset issues for anyone else

    From Carlos E.R.@3:633/10 to All on Thu Mar 12 02:09:02 2026
    Subject: Re: Question: Do Winston's headers cause charset issues for anyone else?

    On 2026-03-12 01:32, Maria Sophia wrote:
    Question:
    Do Winston's headers cause charset issues for anyone else?
    Or just me?

    No.

    Asking TB to produce the raw message, it comes as

    From: =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?= <...>

    Which is legal, obviously. (Reasoning: if TB does it, then it is legal)

    Message-ID: <10obf37$3koaa$1@dont-email.me>
    MIME-Version: 1.0
    Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit

    User-Agent: Mozilla Thunderbird
    Content-Language: en-US



    Looking at the stored file in my computer:

    00000070 50 4F 53 54 ³ 45 44 21 6E ³ 6F 74 2D 66 ³ 6F 72 2D 6D ³ 61 69 6C 0A ³ 46 72 6F 6D ³ 3A 20 3D 3F POSTED!not-for-mail.From: =?
    0000008C 55 54 46 2D ³ 38 3F 42 3F ³ 4C 69 34 75 ³ 64 38 4B 68 ³ 77 37 48 43 ³ 70 38 4B 78 ³ 77 71 54 44 UTF-8?B?Li4ud8Khw7HCp8KxwqTD
    000000A8 73 51 3D 3D ³ 3F 3D 20 3C ³ 77 69 6E 73 ³ 74 6F 6E 6D ³ 76 70 40 67 ³ 6D 61 69 6C ³ 2E 63 6F 6D sQ==?= <..........@gmail.com
    000000C4 3E 0A 4E 65 ³ 77 73 67 72 ³ 6F 75 70 73 ³ 3A 20 61 6C ³ 74 2E 63 6F ³ 6D 70 2E 6F ³ 73 2E 77 69 >.Newsgroups: alt.comp.os.wi

    Which has been processed by Leafnode, without any problem.

    --
    Cheers, Carlos.
    ES??, EU??;

    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Wader of Doom@3:633/10 to All on Thu Mar 12 01:37:18 2026
    Subject: Re: Question: Do Winston's headers cause charset issues for anyone else?

    Maria Sophia <mariasophia@comprehension.com> wrote:

    Question:
    Do Winston's headers cause charset issues for anyone else?
    Or just me?

    Hi, Arlen, how's your User-Agent header?

    --
    Darth Wader [breathe, breathe...]

    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From ...w¡ñ?±?ñ@3:633/10 to All on Thu Mar 12 00:08:57 2026
    Subject: Re: Question: Do Winston's headers cause charset issues for anyone else?

    On 3/11/2026 5:32 PM, Maria Sophia wrote:
    Question:
    Do Winston's headers cause charset issues for anyone else?
    Or just me?

    I am trying to understand something about how different newsreaders handle malformed headers because my home-grown "newsreader" has "problems" when responding to Winston's posts due to the way he formats his "FROM" header.
    From: ...w­¤?ñ?¤ <winstonmvp@gmail.com>

    That line apparently contains non-ASCII characters in the display name:
    ­ (U+00A1)
    ¤ (U+00F1)
    ? (U+00A7)
    ñ (U+00B1)
    ? (U+00A4)
    another ¤ (U+00F1)

    w = standard lower case w keystroke
    ­ = Alt 0161
    ¤ = Alt 0241
    ? = Alt 0167 or ? = Alt 21
    ñ = Alt 0177
    ? = Alt 0164
    ¤ = Alt 0241

    All from one or more fonts available in Character Map.
    - I've come across other folks that use some available character codes
    that appear blank - just copy the code and paste into a field to meet
    the '*' required character entry.


    --
    ...w­¤?ñ?¤

    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From John Hall@3:633/10 to All on Thu Mar 12 09:25:48 2026
    Subject: Re: Question: Do Winston's headers cause charset issues for anyone else?

    On 12/03/2026 06:16, Maria Sophia wrote:
    To add further value to what Carlos kindly tested using Thunderbird, apparently, those on Thunderbird see not this (which is what I see):
    From: ...w­¤?ñ?¤<winstonmvp@gmail.com>

    I'm using Thunderbird and I see exactly what you see. Maybe it's
    something to do with which fonts we have installed or with our Windows settings? (I'm using Windows 11 rather than Windows 10, but I doubt that
    would make any difference.)

    --
    John Hall

    You can divide people into two categories:
    those who divide people into two categories and those who don't

    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Carlos E.R.@3:633/10 to All on Thu Mar 12 11:53:14 2026
    Subject: Re: Question: Do Winston's headers cause charset issues for anyone else?

    On 2026-03-12 07:16, Maria Sophia wrote:
    To add further value to what Carlos kindly tested using Thunderbird, apparently, those on Thunderbird see not this (which is what I see):
    From: ...w­¤?ñ?¤ <winstonmvp@gmail.com>
    Which, is comprised of...
    ­ (U+00A1)
    ¤ (U+00F1)
    ? (U+00A7)
    ñ (U+00B1)
    ? (U+00A4)
    another ¤ (U+00F1)

    But they actually see this instead (according to what Carlos reported):
    From: =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?=

    No, that's what I see when looking at the raw version. What I see in the editor or the message viewer is

    ...w­¤?ñ?¤ <winstonmvp@gmail.com>

    and on a follow up is "On 2026-03-12 08:08, ...w­¤?ñ?¤ wrote:"

    Notice that we are both using thunderbird, so what happens is
    coordinated. It is sent as mime, but displayed as normal utf text.

    That's on the header. The body is plain UTF, no need for any conversion.
    The header needs to be compatible with older software.



    --
    Cheers, Carlos.
    ES??, EU??;

    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From ...w¡ñ?±?ñ@3:633/10 to All on Thu Mar 12 11:15:25 2026
    Subject: Re: Question: Do Winston's headers cause charset issues for anyone else?

    On 3/12/2026 9:18 AM, Maria Sophia wrote:

    Thank you for clarifying what I misunderstood from Carlos' tests, which is that you see what I see which Winston has subsequently confirmed are alt codes he manually typed in to set his FROM Usenet header long ago using
    ...w = ...w (literal)
    ­ = Alt 0161 (Windows inserts byte A1 hexadecimal value)
    ¤ = Alt 0241 (Windows inserts byte F1 hexadecimal value)
    ? = Alt 0167 (Windows inserts byte A7 hexadecimal value)
    ñ = Alt 0177 (Windows inserts byte B1 hexadecimal value)
    ? = Alt 0164 (Windows inserts byte A4 hexadecimal value)

    No typing required.
    Character map, choose font that has desired character(for the above
    Arial works), double click character(places the character in the
    'Characters to copy field', repeat for balance of string, once string is complete, click on Copy. Paste wherever desired(Notepad is a good
    temporary storage point, if using in multiple other apps/programs.

    Thanks for confirming what I see Carlos has also confirmed, which is that
    you see in Thunderbird what I see in my newsreader which is "...w­¤?ñ?¤".
    From: ...w­¤?ñ?¤ <winstonmvp@gmail.com>

    As noted earlier, this is what I see in Thunderbird's From
    column(Message list)
    <https://i.postimg.cc/BvbXZ8mv/Tbird-From-Column-01.jpg>
    The same naming is also seen in the Message pane's From field.
    - b/c its using the Address book contact form

    If wondering about the ... prefix, its a precedent for sorting on the
    From field(my posts appear at the top of an unthreaded sorted list)


    --
    ...w­¤?ñ?¤

    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From MikeS@3:633/10 to All on Thu Mar 12 18:24:43 2026
    Subject: Re: Question: Do Winston's headers cause charset issues for anyone else?

    On 12/03/2026 07:08, ...w­¤?ñ?¤ wrote:
    On 3/11/2026 5:32 PM, Maria Sophia wrote:
    Question:
    Do Winston's headers cause charset issues for anyone else?
    Or just me?

    I am trying to understand something about how different newsreaders
    handle
    malformed headers because my home-grown "newsreader" has "problems" when
    responding to Winston's posts due to the way he formats his "FROM"
    header.
    ÿ From: ...w­¤?ñ?¤ <winstonmvp@gmail.com>

    That line apparently contains non-ASCII characters in the display name:
    ÿ ­ (U+00A1)
    ÿ ¤ (U+00F1)
    ÿ ? (U+00A7)
    ÿ ñ (U+00B1)
    ÿ ? (U+00A4)
    ÿ another ¤ (U+00F1)

    w = standard lower case w keystroke
    ­ = Alt 0161
    ¤ = Alt 0241
    ? = Alt 0167ÿ or ? = Alt 21
    ñ = Alt 0177
    ? = Alt 0164
    ¤ = Alt 0241

    All from one or more fonts available in Character Map.
    ÿ- I've come across other folks that use some available character codes that appear blank - just copy the code and paste into a field to meet
    the '*' required character entry.


    I also see your name as ...w­¤?ñ?¤ (in Betterbird). It doesn't bother me unduly but it has puzzled me for a while. May I ask what you are doing
    and why not simply use winston as in your email address?

    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Stan Brown@3:633/10 to All on Thu Mar 12 12:42:25 2026
    Subject: Re: Question: Do Winston's headers cause charset issues for anyone else?

    On Thu, 12 Mar 2026 02:09:02 +0100, Carlos E.R. wrote:
    Asking TB to produce the raw message, it comes as

    From: =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?= <...>

    Which is legal, obviously. (Reasoning: if TB does it, then it is legal)


    I disagree with that if-then statement. It assumes that all relevant
    standards have been followed accurately and in full, with no bugs.
    I'll leave it as an exercise for the reader to decide how many zeroes
    are needed to express the probability of that.

    --
    "The power of accurate observation is frequently called cynicism by
    those who don't have it." --George Bernard Shaw

    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From ...w¡ñ?±?ñ@3:633/10 to All on Thu Mar 12 22:45:58 2026
    Subject: Re: Question: Do Winston's headers cause charset issues for anyone else?

    On 3/12/2026 11:24 AM, MikeS wrote:
    On 12/03/2026 07:08, ...w­¤?ñ?¤ wrote:
    On 3/11/2026 5:32 PM, Maria Sophia wrote:
    Question:
    Do Winston's headers cause charset issues for anyone else?
    Or just me?

    I am trying to understand something about how different newsreaders
    handle
    malformed headers because my home-grown "newsreader" has "problems" when >>> responding to Winston's posts due to the way he formats his "FROM"
    header.
    ÿ From: ...w­¤?ñ?¤ <winstonmvp@gmail.com>

    That line apparently contains non-ASCII characters in the display name:
    ÿ ­ (U+00A1)
    ÿ ¤ (U+00F1)
    ÿ ? (U+00A7)
    ÿ ñ (U+00B1)
    ÿ ? (U+00A4)
    ÿ another ¤ (U+00F1)

    w = standard lower case w keystroke
    ­ = Alt 0161
    ¤ = Alt 0241
    ? = Alt 0167ÿ or ? = Alt 21
    ñ = Alt 0177
    ? = Alt 0164
    ¤ = Alt 0241

    All from one or more fonts available in Character Map.
    ÿÿ- I've come across other folks that use some available character
    codes that appear blank - just copy the code and paste into a field to
    meet the '*' required character entry.


    I also see your name as ...w­¤?ñ?¤ (in Betterbird). It doesn't bother me unduly but it has puzzled me for a while. May I ask what you are doing
    and why not simply use winston as in your email address?

    Have used that form for nntp and signature since 1998
    Html nntp, Text nntp[1], private nntp groups, private list servers,
    private web groups, blogging...

    [1] text nntp(e.g. Eternal Sept. like servers - no HTML formatting composition) users are the only source where questions, criticism,
    comments occur...but less than 5% of where 'it's' being used.

    <g>Before 1998, the nomenclature was slightly longer
    => W­¤?ñ?¤ª™á¢•gŒ‰


    --
    ...w­¤?ñ?¤

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From ...w¡ñ?±?ñ@3:633/10 to All on Fri Mar 13 00:35:42 2026
    Subject: Re: Question: Do Winston's headers cause charset issues for anyone else?

    Carlos E.R. wrote on 3/12/2026 3:53 AM:
    On 2026-03-12 07:16, Maria Sophia wrote:
    To add further value to what Carlos kindly tested using Thunderbird,
    apparently, those on Thunderbird see not this (which is what I see):
    ÿ From: ...w­¤?ñ?¤ <winstonmvp@gmail.com>
    Which, is comprised of...
    ÿ ­ (U+00A1)
    ÿ ¤ (U+00F1)
    ÿ ? (U+00A7)
    ÿ ñ (U+00B1)
    ÿ ? (U+00A4)
    ÿ another ¤ (U+00F1)

    But they actually see this instead (according to what Carlos reported):
    ÿ From: =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?=

    No, that's what I see when looking at the raw version. What I see in the editor or the message viewer is

    ...w­¤?ñ?¤ <winstonmvp@gmail.com>

    and on a follow up is "On 2026-03-12 08:08, ...w­¤?ñ?¤ wrote:"

    Notice that we are both using thunderbird, so what happens is
    coordinated. It is sent as mime, but displayed as normal utf text.

    That's on the header. The body is plain UTF, no need for any conversion.
    The header needs to be compatible with older software.



    Fyi...
    My comments on Tbird were on:

    Wed, 11 Mar 2026 23:42:29 -0700
    and
    Thur, 12 Mar 2026 11:15:25 -0700


    --
    ...w­¤?ñ?¤

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From MikeS@3:633/10 to All on Fri Mar 13 08:50:26 2026
    Subject: Re: Question: Do Winston's headers cause charset issues for anyone else?

    On 13/03/2026 05:45, ...w­¤?ñ?¤ wrote:
    On 3/12/2026 11:24 AM, MikeS wrote:
    On 12/03/2026 07:08, ...w­¤?ñ?¤ wrote:
    On 3/11/2026 5:32 PM, Maria Sophia wrote:
    Question:
    Do Winston's headers cause charset issues for anyone else?
    Or just me?

    I am trying to understand something about how different newsreaders
    handle
    malformed headers because my home-grown "newsreader" has "problems"
    when
    responding to Winston's posts due to the way he formats his "FROM"
    header.
    ÿ From: ...w­¤?ñ?¤ <winstonmvp@gmail.com>

    That line apparently contains non-ASCII characters in the display name: >>>> ÿ ­ (U+00A1)
    ÿ ¤ (U+00F1)
    ÿ ? (U+00A7)
    ÿ ñ (U+00B1)
    ÿ ? (U+00A4)
    ÿ another ¤ (U+00F1)

    w = standard lower case w keystroke
    ­ = Alt 0161
    ¤ = Alt 0241
    ? = Alt 0167ÿ or ? = Alt 21
    ñ = Alt 0177
    ? = Alt 0164
    ¤ = Alt 0241

    All from one or more fonts available in Character Map.
    ÿÿ- I've come across other folks that use some available character
    codes that appear blank - just copy the code and paste into a field
    to meet the '*' required character entry.


    I also see your name as ...w­¤?ñ?¤ (in Betterbird). It doesn't bother
    me unduly but it has puzzled me for a while. May I ask what you are
    doing and why not simply use winston as in your email address?

    Have used that form for nntp and signature since 1998
    Html nntp, Text nntp[1], private nntp groups, private list servers,
    private web groups, blogging...

    [1] text nntp(e.g. Eternal Sept. like servers - no HTML formatting composition) users are the only source where questions, criticism,
    comments occur...but less than 5% of where 'it's' being used.

    <g>Before 1998, the nomenclature was slightly longer
    ÿ =>ÿ W­¤?ñ?¤ª™á¢•gŒ‰


    I guess the answer to my question is that you want to be different. You certainly succeeded as I have never seen any other email or usenet user emulate you. In fact when other usenet users want to refer to your
    comments in a thread they mostly type "winston". Its easier and makes
    more sense.

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Dave Royal@3:633/10 to All on Fri Mar 13 10:02:54 2026
    Subject: Re: Question: Do Winston's headers cause charset issues for anyone else?

    Maria Sophia <mariasophia@comprehension.com> Wrote in message:

    Maria Sophia wrote:
    People have complained *to me* that my responses have mojibake in them.
    So I'm trying to fix that problem *for them*.

    Delving deeper in thought...

    Given RFC 5322 says headers must be ASCII unless MIME-encoded, others have pointed out Big-5 & ISO-8859-1 sometimes gets inserted into my headers.

    I don't add that. I can't add them. They're not in my dictionaries.
    So "something else" must be adding them. But what?

    I never really understood character encoding, and I've said so many times. But I wonder if what's happening is possibly
    1. The "From:" display name contains raw CP1252 bytes
    2. Which are not valid UTF-8
    3. Where, if my outgoing message declares "charset=UTF-8"
    4. Maybe some NNTP servers might respond by trying to be helpful
    5. One way being by slapping a different charset label on the header
    Given... these CP1252 bytes (0xA1, 0xA7, 0xB1, 0xF1) are
    a. illegal in UTF-8
    b. legal in ISO-8859-1
    c. also legal byte patterns in Big-5
    Maybe that's where some of my responses get ISO-8859-1 or Big-5 headers?

    Maybe... given UTF-8 is not ASCII, but ASCII is valid UTF-8...
    i. Declaring UTF-8 forces some nntp servers to validate all bytes.
    ii. But CP1252 bytes are illegal in UTF-8
    iii. Where UTF-8 replies trigger more server 'helpfulness'

    An interesting related aside is that... for
    I. 0xA1 is not a valid UTF-8 start byte
    II. 0xF1 is a valid UTF-8 start byte,
    but only if followed by 0x80-0xBF, which it isn't
    III. 0xA7 is illegal as a UTF-8 start byte
    IV. 0xB1 is illegal as a UTF-8 start byte
    V. 0xA4 is illegal as a UTF-8 start byte
    VI. 0xF1 is a valid UTF-8 start byte,
    but only if followed by 0x80-0xBF, which it isn't

    The RFC-correct solution would be:
    From: =?UTF-8?Q?W=C2=A1=C3=B1=C2=A7=C2=B1=C2=A4=C3=B1=C2=AC=C3=96=C3=9F=C3=B3=C3=B2g=C3=AE=C3=AB?= <...>
    But that's ugly.

    Using W­¤?ñ?¤ª™á¢•gŒ‰ would be even more so, given
    VII. 0xAC is illegal as a UTF-8 start byte
    VIII. 0xD6 is a valid start byte only if followed by continuation byte
    And so on, where the "W" in W­¤?ñ?¤ and the "g" in ™á¢•gŒ‰ are the only bytes in that entire (pre 1988) decorated name that is both ASCII and valid UTF-8. Everything else is raw CP1252.

    The UTF-8 version of the whole name would be:
    57 C2 A1 C3 B1 C2 A7 C2 B1 C2 A4 C3 B1 C2 AC C3 96 C3 9F C3 B3 C3 B2 67 C3 AE C3 AB

    But all this is only meaningful if it causes downstream issues,
    where I think simply switching my headers to ASCII solved the
    mojibake that Andy, Carlos and others asked me to try to fix.


    I wrote a newsreader a few years ago, in Python. Python had a
    module to decode headers encoded as in RFC2047; this one I
    think:
    <https://docs.python.org/3/library/email.header.html> <https://www.ietf.org/rfc/rfc2047>

    I didn't bother to detect /whether/ the headers had encoded words,
    I decoded everything in case it did. (I've seen several encoded
    words in different encodings in a single header field.)

    In your case it sounds like you need an encoder as well as a
    decoder. If there aren't such modules in whatever your system is
    written in, you could write one. Perhaps a sub-process written in
    Python: pass it the raw header and it returns it in unicode. And
    vice versa to encode it.

    --
    Remove numerics from my email address.

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Carlos E.R.@3:633/10 to All on Fri Mar 13 13:42:07 2026
    Subject: Re: Question: Do Winston's headers cause charset issues for anyone else?

    On 2026-03-12 17:18, Maria Sophia wrote:
    John Hall wrote:
    On 12/03/2026 06:16, Maria Sophia wrote:
    To add further value to what Carlos kindly tested using Thunderbird,
    apparently, those on Thunderbird see not this (which is what I see):
    From: ...w­¤?ñ?¤<winstonmvp@gmail.com>

    I'm using Thunderbird and I see exactly what you see. Maybe it's
    something to do with which fonts we have installed or with our Windows
    settings? (I'm using Windows 11 rather than Windows 10, but I doubt that
    would make any difference.)

    Thank you for clarifying what I misunderstood from Carlos' tests, which is that you see what I see which Winston has subsequently confirmed are alt codes he manually typed in to set his FROM Usenet header long ago using
    ...w = ...w (literal)
    ­ = Alt 0161 (Windows inserts byte A1 hexadecimal value)
    ¤ = Alt 0241 (Windows inserts byte F1 hexadecimal value)
    ? = Alt 0167 (Windows inserts byte A7 hexadecimal value)
    ñ = Alt 0177 (Windows inserts byte B1 hexadecimal value)
    ? = Alt 0164 (Windows inserts byte A4 hexadecimal value)

    Those are all valid Windows Alt-codes, but the important detail is that
    they produce raw 8-bit bytes from the Windows-1252 (Latin-1) character set.

    I could be wrong as I never really understood this characters stuff, but
    a. They are not UTF-8
    b. They are not ASCII
    c. They are not MIME-encoded
    d. They are raw 8-bit bytes

    Huh, no. They were typed as 8-bit bytes from Latin-1 charset at some
    point in time, but today they are UTF-8. UTF in the body, and as MIME in
    the header.

    You said it yourself in another post:

    The valid format is:

    =?charset?encoding?encoded-text?=

    Hence, if we break Winston's header down:

    =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?=
    | | | |
    | | | +-- Base64 text
    | | +------------------------ Encoding type ("B" = Base64)
    | +-------------------------- Character set (UTF-8)
    +-------------------------------- Begin encoded-word

    The Base64 portion is:

    Li4ud8Khw7HCp8KxwqTDsQ==

    Decoding that Base64 string yields the UTF-8 text:

    ...w­¤?ñ?¤




    --
    Cheers, Carlos.
    ES??, EU??;

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Carlos E.R.@3:633/10 to All on Fri Mar 13 13:47:24 2026
    Subject: Re: Question: Do Winston's headers cause charset issues for anyone else?

    On 2026-03-12 17:42, Maria Sophia wrote:
    Carlos E.R. wrote:
    But they actually see this instead (according to what Carlos reported):
    From: =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?=

    No, that's what I see when looking at the raw version. What I see in the
    editor or the message viewer is

    ...w­¤?ñ?¤ <winstonmvp@gmail.com>

    and on a follow up is "On 2026-03-12 08:08, ...w­¤?ñ?¤ wrote:"

    Notice that we are both using thunderbird, so what happens is
    coordinated. It is sent as mime, but displayed as normal utf text.

    That's on the header. The body is plain UTF, no need for any conversion.
    The header needs to be compatible with older software.

    Hi Carlos,

    Thanks for correcting my misconception as I never really understood all
    this mojibake character-set interaction but now that Winston explained he
    is typing Windows Alt-codes, and after your clarification, I am scratching the surface at beginning to understand what is actually happening.

    It may be that Thunderbird *stores* or *shows* the header in MIME-encoded form when you view the raw source, but apparently Thunderbird does not MIME-encode Winston's display name when sending the message.

    I'm not using Thunderbird (and I changed the header to reflect that since
    TB users are on this thread) but it appears that in normal viewing mode, Thunderbird simply displays the raw 8-bit Windows-1252 bytes exactly as
    they appear:

    No, TB displays UTF-8. At least here, all the computer uses UTF-8.


    ...w­¤?ñ?¤ <winstonmvp@gmail.com>

    Which matches what I see on my end.

    Apparently Thunderbird is perfectly happy to accept those raw 8-bit bytes
    in the header, even though they are not valid UTF-8 and not legal ASCII.

    The header is MIME encoded.


    My own workflow is strict ASCII, so when those bytes get copied into my attribution line, I think what happens is some NNTP servers try to repair
    the mismatch and end up mangling my outgoing post, which is really the only reason I care (as I don't care to be a Usenet-rules enforcer by any means).

    So, to clarify, I think you & Winston are saying the behavior is:
    1. Winston types Windows11252 Alt-codes.
    2. Thunderbird displays them as-is in the UI.
    3. Thunderbird shows a MIME-encoded version only when viewing
    the raw message source.
    4. My ASCII-only workflow exposes the illegal bytes,
    which sometimes apparently triggers server rewrites (AFAICT)

    Thanks again for checking this from the Thunderbird side, as knowing how
    you see Winston's messages helps me figure out how to handle the mojibake.


    --
    Cheers, Carlos.
    ES??, EU??;

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Carlos E.R.@3:633/10 to All on Fri Mar 13 13:57:02 2026
    Subject: Re: Question: Do Winston's headers cause charset issues for anyone else?

    On 2026-03-13 10:38, Maria Sophia wrote:
    THIS IS A TEST. IT'S AN EXACT COPY OF THE PREVIOUS POST.
    THE ONLY DIFFERENCE IS THIS HAS UTF-8 DECLARED IN THE HEADER. NOT ASCII.
    DO YOU SEE THE SAME OUTPUT or DO YOU SEE IT DIFFERENTLY?

    ...


    The RFC-correct solution would be:
    From: =?UTF-8?Q?W=C2=A1=C3=B1=C2=A7=C2=B1=C2=A4=C3=B1=C2=AC=C3=96=C3=9F=C3=B3=C3=B2g=C3=AE=C3=AB?=
    <...>
    But that's ugly.

    Using W???????g?? would be even more so, given
    --------************

    This text arrives corrupted. In the other post they are legible. It is declared as UTF-8, but I guess it is not actually all valid UTF-8.

    --
    Cheers, Carlos.
    ES??, EU??;

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Carlos E.R.@3:633/10 to All on Fri Mar 13 23:25:59 2026
    Subject: Re: Question: Do Winston's headers cause charset issues for anyone else?

    On 2026-03-13 22:11, Maria Sophia wrote:
    Carlos E.R. wrote:
    I could be wrong as I never really understood this characters stuff, but >>> a. They are not UTF-8
    b. They are not ASCII
    c. They are not MIME-encoded
    d. They are raw 8-bit bytes

    Huh, no. They were typed as 8-bit bytes from Latin-1 charset at some
    point in time, but today they are UTF-8. UTF in the body, and as MIME in
    the header.

    You said it yourself in another post:

    Hi Carlos,

    I agree. I apologize for the flip flop indecision. I don't know what's
    going on, as I'm only trying to fix the trouble W­¤?ñ?¤ª™á¢•gŒ‰ creates.

    I will endlessly admit I never understood this charset stuff, and I will point out that the only reason I even care is you and others asked me to
    fix the problems that sometimes my posts look like a Chinese jigsaw puzzle.

    Since I don't mess with the characters, something else is messing with the characters, where a test in this very thread shows that when I use headers
    Content-Type: text/plain; charset=US-ASCII
    Content-Transfer-Encoding: 7bi t
    Then W­¤?ñ?¤ª™á¢•gŒ‰ remains W­¤?ñ?¤ª™á¢•gŒ‰

    But when I use headers
    Content-Type: text/plain; charset=UTF-8
    Content-Transfer-Encoding: 8bi t
    Then W­¤?ñ?¤ª™á¢•gŒ‰ turns the entire post into a ransom note.

    Possibly because the text is not actually UTF-8



    Usenet (NNTP) follows email header rules (RFC 5322 + RFC 2047):
    a. The body may be UTF-8, if declared.
    b. Headers cannot contain raw 8-bit bytes.
    c. Hence, non-ASCII characters must be encoded using MIME encoded-words
    From: =?UTF-8?Q?W=C2=A1=C3=B1=C2=A7=C2=B1=C2=A4=C3=B1?= <winston@example.com>

    Given Winston's "FROM:" header has those characters, which are not ASCII,
    all I can say is that they're not valid characters for *headers*, unless they're MIME encoded. Are they Mime-encoded? I don't know. I don't see it.

    Yes, they are MIME encoded. I posted the other day the section in HEX,
    taken directly from the on disk file that Leafnode has written on my
    system, so no translation from Thunderbird.


    As you said, I belatedly realized Winston's characters are valid Unicode
    and valid UTF-8 but they appear in a header, apparently without required
    MIME encoding when Usenet servers are allowed to mangle or reject 8-bit header bytes. When I respond, the attribute line contains W­¤?ñ?¤ª™á¢•gŒ‰

    What I'm trying to figure out is why my body gets mangled because the attribution line contains raw Latin-1 bytes, but my outgoing headers
    declare UTF-8, so I think a server in the path re-encodes the body and corrupts it. But I'm not really sure what is causing the mojibake. .


    --
    Cheers, Carlos.
    ES??, EU??;

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Carlos E.R.@3:633/10 to All on Sat Mar 14 14:11:54 2026
    Subject: Re: Question: Do Winston's headers cause charset issues for anyone else?

    On 2026-03-14 05:45, Maria Sophia wrote:
    Carlos E.R. wrote:
    Then W­¤?ñ?¤ª™á¢•gŒ‰ turns the entire post into a ransom note.

    Possibly because the text is not actually UTF-8

    Yeah. In a later post you see I belatedly figured that out for myself.
    Sorry for the flip flop indecision on whether I think it's UTF-8 or not.

    Did I ever mention I never really understood this Usenet charset stuff?

    I'm one of the few people whose ego isn't so huge that they can't admit
    when they don't know something, where I openly and humbly easily admit that
    I seriously lack charset understanding when it comes to Usenet headers.

    Luckily, the two things I'm doing seems to work "most" of the time:
    a. If I copy/paste from a variety of web sources (particularly Chromium),
    I run my body through a text-normalizer to eliminate Unicode chars.
    <shortcuts.xml>
    b. I manually place a US-ASCII header which seems to tell the receiving
    newsreaders not to both trying to deal with W­¤?ñ?¤ª™á¢•gŒ‰'s
    Windows-1252 ISO-8859-1 (Latin-1) character set.
    w = 0x57 (ASCII)
    ­ = 0xA1
    ¤ = 0xF1
    ? = 0xA7
    ñ = 0xB1
    ? = 0xA4
    ª = 0xAC
    ™ = 0xD6
    á = 0xDF
    ¢ = 0xF3
    • = 0xF2
    g = 0x67 (ASCII)
    Œ = 0xEE
    ‰ = 0xEB
    Every one of those bytes is a single-byte Latin-1 / Windows-1252 character. None of them are UTF-8.

    Given Winston's "FROM:" header has those characters, which are not ASCII, >>> all I can say is that they're not valid characters for *headers*, unless >>> they're MIME encoded. Are they Mime-encoded? I don't know. I don't see it. >>
    Yes, they are MIME encoded. I posted the other day the section in HEX,
    taken directly from the on disk file that Leafnode has written on my
    system, so no translation from Thunderbird.

    I may be wrong since I never understood this stuff, so I appreciate your clarifications, and I openly let you know I really don't understand this.

    I think you are describing Thunderbird's behavior, not necessarily
    Winston's behavior, while mostly I'm describing Winston's original bytes,
    not Thunderbird's. (Although it appears that Winston uses TB after all.)

    He does.

    I have looked at posts from him, in three ways:

    * as TB displays them
    * as TB "view-raw" displays them
    * at the computer file, which is stored by leafnode in my system.


    I think we can all presume Winston originally long ago typed raw
    Windows-1252 bytes using Alt-codes for his display name, but I think it may be that TB does not actually send those bytes directly in the header.

    Certainly not. Currently it sends MIME encoded UTF-8 header.


    Those are raw 8-bit Latin-1 bytes when he types them.
    However, I think TB does not send those bytes directly.

    When Winston posts using TB, I think TB maybe perhaps converts the Latin-1 bytes to UTF-8, and then MIME-encodes the header using RFC 2047. That may
    be why the raw source on your system shows something like:

    From: =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?=


    Yes.

    On your side, TB maybe perhaps then decodes that MIME-encoded header for display, so in the normal UI you see:

    ...w!n?ñ?n <winstonmvp@gmail.com>


    Yes.

    I'm rather confused, as I don't control anything but my side of the
    equation, and all I'm doing is dealing with Winston's display name,
    but maybe what's possibly happening overall, is this (maybe?):

    1. Winston typed Windows-1252 Alt-codes for his display name long ago.
    ...W­¤?ñ?¤
    2. His Thunderbird converts those Latin-1 bytes to UTF-8 internally.
    3. His Thunderbird MIME-encodes the UTF-8 header before sending it.
    4. Your Thunderbird decodes the MIME header & displays normal UTF-8 text.

    Yes.

    5. My own newsreader client copies the original Latin-1 bytes from the
    attribution line because it does not decode the MIME header.
    6. That mismatch triggers mojibake in my outgoing posts when my headers
    declare "charset=UTF-8" instead of "charset=US-ASCII".

    I never understood this stuff, but perhaps maybe that explains why you see
    a valid MIME-encoded UTF-8 header in the raw view, while I see the original Latin-1 bytes in my ASCII world. Thunderbird is doing the right thing on Winston's end, but perhaps my own ASCII-only setup exposes the mismatch.

    Presumably your upstream nntp server sends to you the same utf-8 mime
    encoded header that I get, but your system does not interpret it correctly.


    Thanks again for helping me sort out what Thunderbird is doing on your
    side, as I used TB years ago for a client and hated how it thought Usenet
    was email. Maybe it's better now as that had to be a decade or so ago.


    I have no understanding of the RFCs, I simply observe what TB and
    Leafnode seem to do. I also looked in another machine that only uses TB.
    I also have the memory of what I have read over the years.

    Forget latin-1. The servers are sending mime encoded utf-8 in the
    headers. Life is simple that way.


    --
    Cheers, Carlos.
    ES??, EU??;

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From ...w¡ñ?±?ñ@3:633/10 to All on Sat Mar 14 11:27:50 2026
    Subject: Re: Question: Do Winston's headers cause charset issues for anyone else?

    On 3/13/2026 2:11 PM, Maria Sophia wrote:

    As you said, I belatedly realized Winston's characters are valid Unicode
    and valid UTF-8 but they appear in a header, apparently without required
    MIME encoding when Usenet servers are allowed to mangle or reject 8-bit header bytes. When I respond, the attribute line contains W­¤?ñ?¤ª™á¢•gŒ‰


    fyi...if you are seeing W­¤?ñ?¤ª™á¢•gŒ‰ anywhere except as an
    typed/pasted item in a posted reply then it might be beneficial to look
    again.
    => W­¤?ñ?¤ª™á¢•gŒ‰ is not and has never been used as a From,
    Signature, or email name in this forum(alt.comp.os.window-10)
    -i.e. If trying to fix something in your homegrown newsreader for W­¤?ñ?¤ª™á¢•gŒ‰ in an attribute line, you're on the wrong path.

    --
    ...w­¤?ñ?¤

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From ...w¡ñ?±?ñ@3:633/10 to All on Sat Mar 14 11:32:22 2026
    Subject: Re: Question: Do Winston's headers cause charset issues for anyone else?

    On 3/13/2026 9:45 PM, Maria Sophia wrote:

    I think you are describing Thunderbird's behavior, not necessarily
    Winston's behavior, while mostly I'm describing Winston's original bytes,
    not Thunderbird's. (Although it appears that Winston uses TB after all.)

    Yes to TB. Also SeaMonkey and WLM2012

    I think we can all presume Winston originally long ago typed raw
    Windows-1252 bytes using Alt-codes for his display name, but I think it may be that TB does not actually send those bytes directly in the header.

    As noted earlier..no typing was ever done. The string was created in
    Character map with typing - select, repeat for next character, copy
    string, paste to desired field.




    --
    ...w­¤?ñ?¤

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From ...w¡ñ?±?ñ@3:633/10 to All on Sat Mar 14 11:34:31 2026
    Subject: Re: Question: Do Winston's headers cause charset issues for anyone else?

    On 3/14/2026 6:11 AM, Carlos E.R. wrote:

    Forget latin-1. The servers are sending mime encoded utf-8 in the
    headers. Life is simple that way.


    +1

    --
    ...w­¤?ñ?¤

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)