• Re: 16:32 far pointers in OpenWatcom C/C++

    From Peter Flass@3:633/10 to All on Sun Nov 2 12:57:51 2025
    On 3/25/10 03:02, Nick Keighley wrote:
    On 24 Mar, 23:40, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
    wrote:
    Dann Corbit <dcor...@connx.com> writes:
    In article <1e27d5ee-a1b1-45d9-9188-
    63ab37398...@d37g2000yqn.googlegroups.com>,
    nick_keighley_nos...@hotmail.com says...

    On 23 Mar, 20:56, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
    In alt.sys.pdp10 Richard Bos <ralt...@xs4all.nl> wrote:
    (snip)

    That crossposting was, for once, not asinine. It served as a nice
    example why, even now, Leenux weenies are not correct when they insist >>>>>> that C has a flat memory model and all pointers are just numbers.

    Well, you could also read the C standard to learn that.

    but if you say that you get accused of language lawyering.
    "Since IBM stopped making 360s no C program ever needs to run on such
    a platform"

    We have customers who are running their business on harware from the mid >>> 1980s. ˙It may sound ludicrous, but it if solves all of their business
    needs, and runs solid 24x365, why should they upgrade?

    Because they could run an equivalently computationally powerful
    solution with various levels of redundancy and fail-over protection,
    with a power budget sensibly measured in mere Watts?

    does it have a Coral compiler?

    There's a market for someone.

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Peter Flass@3:633/10 to All on Sun Nov 2 13:08:25 2025
    On 3/23/10 13:56, glen herrmannsfeldt wrote:
    In alt.sys.pdp10 Richard Bos <raltbos@xs4all.nl> wrote:
    (snip)

    That crossposting was, for once, not asinine. It served as a nice
    example why, even now, Leenux weenies are not correct when they insist
    that C has a flat memory model and all pointers are just numbers.


    This is true often enough to be dangerous when it turns out not to be.

    Well, you could also read the C standard to learn that.

    There are additional complications for C on the PDP-10.

    -- glen


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Peter Flass@3:633/10 to All on Sun Nov 2 13:20:14 2025
    What happened here?? I just noticed that a lot of these posts are from
    2010. Did some news server just barf?

    On 3/23/10 14:42, Peter Flass wrote:
    Branimir Maksimovic wrote:
    On Tue, 23 Mar 2010 06:51:18 -0400
    Peter Flass <Peter_Flass@Yahoo.com> wrote:

    Jonathan de Boyne Pollard wrote:
    Returning to what we were talking about before the silly diversion,
    I should point out that 32-bit applications programming where the
    target is extended DOS or 32-bit Win16 (with OpenWatcom's extender)
    will also occasionally employ 16:32 far pointers of course.˙ But as
    I said before, regular 32-bit OS/2 or Win32 applications
    programming generally does not, since those both use the Tiny
    memory model,
    Flat memory model.

    Problem with standard C and C++ is that they assume flat memory
    model.

    I'm not a C expert, perhaps you're a denizen of comp.lang.c, but as far
    as I know there's nothing in the C standard that assumes anything about pointers, except that they have to be the same size as int, so for 16:32 pointers I guess you'd need 64-bit ints.

    As far as implementations are concerned, both Watcom and IBM VA C++
    support segmented memory models.˙ These are the ones I'm aware of, there
    are probably more.


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lynn McGuire@3:633/10 to All on Mon Nov 3 14:24:27 2025
    On 11/2/2025 2:20 PM, Peter Flass wrote:
    What happened here?? I just noticed that a lot of these posts are from
    2010. Did some news server just barf?

    On 3/23/10 14:42, Peter Flass wrote:
    Branimir Maksimovic wrote:
    On Tue, 23 Mar 2010 06:51:18 -0400
    Peter Flass <Peter_Flass@Yahoo.com> wrote:

    Jonathan de Boyne Pollard wrote:
    Returning to what we were talking about before the silly diversion,
    I should point out that 32-bit applications programming where the
    target is extended DOS or 32-bit Win16 (with OpenWatcom's extender)
    will also occasionally employ 16:32 far pointers of course.˙ But as
    I said before, regular 32-bit OS/2 or Win32 applications
    programming generally does not, since those both use the Tiny
    memory model,
    Flat memory model.

    Problem with standard C and C++ is that they assume flat memory
    model.

    I'm not a C expert, perhaps you're a denizen of comp.lang.c, but as
    far as I know there's nothing in the C standard that assumes anything
    about pointers, except that they have to be the same size as int, so
    for 16:32 pointers I guess you'd need 64-bit ints.

    As far as implementations are concerned, both Watcom and IBM VA C++
    support segmented memory models.˙ These are the ones I'm aware of,
    there are probably more.

    I asked Ray Banana of E-S about the openwatcom.* groups and he
    resurrected them with all of their very old postings.

    Lynn


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Peter Flass@3:633/10 to All on Mon Nov 3 16:25:06 2025
    On 11/3/25 13:24, Lynn McGuire wrote:
    On 11/2/2025 2:20 PM, Peter Flass wrote:
    What happened here?? I just noticed that a lot of these posts are from
    2010. Did some news server just barf?

    On 3/23/10 14:42, Peter Flass wrote:
    Branimir Maksimovic wrote:
    On Tue, 23 Mar 2010 06:51:18 -0400
    Peter Flass <Peter_Flass@Yahoo.com> wrote:

    Jonathan de Boyne Pollard wrote:
    Returning to what we were talking about before the silly diversion, >>>>>> I should point out that 32-bit applications programming where the
    target is extended DOS or 32-bit Win16 (with OpenWatcom's extender) >>>>>> will also occasionally employ 16:32 far pointers of course.˙ But as >>>>>> I said before, regular 32-bit OS/2 or Win32 applications
    programming generally does not, since those both use the Tiny
    memory model,
    Flat memory model.

    Problem with standard C and C++ is that they assume flat memory
    model.

    I'm not a C expert, perhaps you're a denizen of comp.lang.c, but as
    far as I know there's nothing in the C standard that assumes anything
    about pointers, except that they have to be the same size as int, so
    for 16:32 pointers I guess you'd need 64-bit ints.

    As far as implementations are concerned, both Watcom and IBM VA C++
    support segmented memory models.˙ These are the ones I'm aware of,
    there are probably more.

    I asked Ray Banana of E-S about the openwatcom.* groups and he
    resurrected them with all of their very old postings.

    Lynn


    Oh, OK. Also everything that was X-posted. No worries.


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Kaz Kylheku@3:633/10 to All on Tue Nov 4 00:26:40 2025
    On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/3/25 13:24, Lynn McGuire wrote:

    When I saw this subject line, I thought it was some necroposting to
    threads from 1990.

    Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

    Amazing ...

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Scott Lurndal@3:633/10 to All on Tue Nov 4 15:20:41 2025
    Kaz Kylheku <643-408-1753@kylheku.com> writes:
    On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/3/25 13:24, Lynn McGuire wrote:

    When I saw this subject line, I thought it was some necroposting to
    threads from 1990.

    Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

    There are still people on the internet who swear that the 286 is
    better than sliced bread and refuse to recognize that modern
    architectures are superior.


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Paul S Person@3:633/10 to All on Tue Nov 4 08:29:21 2025
    On Tue, 4 Nov 2025 00:26:40 -0000 (UTC), Kaz Kylheku
    <643-408-1753@kylheku.com> wrote:

    On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/3/25 13:24, Lynn McGuire wrote:

    When I saw this subject line, I thought it was some necroposting to
    threads from 1990.

    Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

    Amazing ...

    I'm not sure about today, but that late there were still people
    programming on older hardware for various specialized purposes. Or
    rather, I suppose, maintaining the code. (I would say "devices" but
    that now implies "something that runs Apps" and these were much much
    older).

    One of the advantages of Watcom (and so OpenWatcom) has always been
    support for 16-bit programming. Of course, whether that is true today
    is hard to say.
    --
    "Here lies the Tuscan poet Aretino,
    Who evil spoke of everyone but God,
    Giving as his excuse, 'I never knew him.'"

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Paul S Person@3:633/10 to All on Tue Nov 4 08:32:47 2025
    On Mon, 3 Nov 2025 14:24:27 -0600, Lynn McGuire
    <lynnmcguire5@gmail.com> wrote:

    On 11/2/2025 2:20 PM, Peter Flass wrote:
    What happened here?? I just noticed that a lot of these posts are from

    2010. Did some news server just barf?

    <snippo>

    I asked Ray Banana of E-S about the openwatcom.* groups and he
    resurrected them with all of their very old postings.

    I tested the OpenWatcom Usenet server, and Agent reported no response.

    So the groups still exist (and, I suspect, not just on E-S), but not
    at the source.
    --
    "Here lies the Tuscan poet Aretino,
    Who evil spoke of everyone but God,
    Giving as his excuse, 'I never knew him.'"

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Peter Flass@3:633/10 to All on Tue Nov 4 09:39:41 2025
    On 11/4/25 08:20, Scott Lurndal wrote:
    Kaz Kylheku <643-408-1753@kylheku.com> writes:
    On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/3/25 13:24, Lynn McGuire wrote:

    When I saw this subject line, I thought it was some necroposting to
    threads from 1990.

    Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

    There are still people on the internet who swear that the 286 is
    better than sliced bread and refuse to recognize that modern
    architectures are superior.


    I was thinking, are there any segmented architectures today? Most
    disguise segmentation as a flat address space (e.g. IBM System/370 et.seq.)

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Richard Heathfield@3:633/10 to All on Tue Nov 4 17:12:46 2025
    On 04/11/2025 15:20, Scott Lurndal wrote:
    Kaz Kylheku <643-408-1753@kylheku.com> writes:
    On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/3/25 13:24, Lynn McGuire wrote:

    When I saw this subject line, I thought it was some necroposting to
    threads from 1990.

    Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

    There are still people on the internet who swear that the 286 is
    better than sliced bread and refuse to recognize that modern
    architectures are superior.

    I can still hear them down the hall.

    ST!
    .......................................................Amiga!
    ST!
    .......................................................Amiga!

    --
    Richard Heathfield
    Email: rjh at cpax dot org dot uk
    "Usenet is a strange place" - dmr 29 July 1999
    Sig line 4 vacant - apply within

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Scott Lurndal@3:633/10 to All on Tue Nov 4 17:14:01 2025
    Peter Flass <Peter@Iron-Spring.com> writes:
    On 11/4/25 08:20, Scott Lurndal wrote:
    Kaz Kylheku <643-408-1753@kylheku.com> writes:
    On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/3/25 13:24, Lynn McGuire wrote:

    When I saw this subject line, I thought it was some necroposting to
    threads from 1990.

    Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

    There are still people on the internet who swear that the 286 is
    better than sliced bread and refuse to recognize that modern
    architectures are superior.


    I was thinking, are there any segmented architectures today?

    Only in emulation (see Unisys Clearpath, for example).

    Most
    disguise segmentation as a flat address space (e.g. IBM System/370 et.seq.)

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From geodandw@3:633/10 to All on Tue Nov 4 12:15:27 2025
    On 11/4/25 12:12, Richard Heathfield wrote:
    On 04/11/2025 15:20, Scott Lurndal wrote:
    Kaz Kylheku <643-408-1753@kylheku.com> writes:
    On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/3/25 13:24, Lynn McGuire wrote:

    When I saw this subject line, I thought it was some necroposting to
    threads from 1990.

    Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

    There are still people on the internet who swear that the 286 is
    better than sliced bread and refuse to recognize that modern
    architectures are superior.

    I can still hear them down the hall.

    ST!
    .......................................................Amiga!
    ST!
    .......................................................Amiga!

    The 68000 was a very nice processor for its time. It's too bad IBM
    didn't use it in the PC.

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Kaz Kylheku@3:633/10 to All on Tue Nov 4 17:21:31 2025
    On 2025-11-04, geodandw <geodandw@gmail.com> wrote:
    The 68000 was a very nice processor for its time. It's too bad IBM
    didn't use it in the PC.

    That would have been so much better even if it still had had a shitty
    CP/M-like OS with drive letter names and whatnot.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Scott Lurndal@3:633/10 to All on Tue Nov 4 17:32:24 2025
    scott@slp53.sl.home (Scott Lurndal) writes:
    Peter Flass <Peter@Iron-Spring.com> writes:
    On 11/4/25 08:20, Scott Lurndal wrote:
    Kaz Kylheku <643-408-1753@kylheku.com> writes:
    On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/3/25 13:24, Lynn McGuire wrote:

    When I saw this subject line, I thought it was some necroposting to
    threads from 1990.

    Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

    There are still people on the internet who swear that the 286 is
    better than sliced bread and refuse to recognize that modern
    architectures are superior.


    I was thinking, are there any segmented architectures today?

    Only in emulation (see Unisys Clearpath, for example).

    Although it's worth pointing out that harvard architectures
    still exist (e.g. CEVA DSPs) and the low-power ARM
    M-series core 32-bit physical address space is
    divided into 28-bit regions some of which may
    provide programmable windows into alternate address spaces
    in a fashion very similar to segmentation.


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Kaz Kylheku@3:633/10 to All on Tue Nov 4 17:38:28 2025
    On 2025-11-04, Scott Lurndal <scott@slp53.sl.home> wrote:
    scott@slp53.sl.home (Scott Lurndal) writes:
    Peter Flass <Peter@Iron-Spring.com> writes:
    On 11/4/25 08:20, Scott Lurndal wrote:
    Kaz Kylheku <643-408-1753@kylheku.com> writes:
    On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/3/25 13:24, Lynn McGuire wrote:

    When I saw this subject line, I thought it was some necroposting to
    threads from 1990.

    Someone still cared about segmented x86 shit in 2010 (even if 32 bit)? >>>>
    There are still people on the internet who swear that the 286 is
    better than sliced bread and refuse to recognize that modern
    architectures are superior.


    I was thinking, are there any segmented architectures today?

    Only in emulation (see Unisys Clearpath, for example).

    Although it's worth pointing out that harvard architectures
    still exist (e.g. CEVA DSPs) and the low-power ARM

    Ah, that. I worked with the TeakLite III.

    In addition to the hardvard thing, I remember its smallest addressable
    is was 16 bits, From the host processor (ARM) in that SoC, it appeared
    to have "funny endian": 32 bit words written by the TeakLite appeared
    in 2143 order or something, not 1234 or 4321.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Tue Nov 4 21:23:44 2025
    On 04/11/2025 18:32, Scott Lurndal wrote:
    scott@slp53.sl.home (Scott Lurndal) writes:
    Peter Flass <Peter@Iron-Spring.com> writes:
    On 11/4/25 08:20, Scott Lurndal wrote:
    Kaz Kylheku <643-408-1753@kylheku.com> writes:
    On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/3/25 13:24, Lynn McGuire wrote:

    When I saw this subject line, I thought it was some necroposting to
    threads from 1990.

    Someone still cared about segmented x86 shit in 2010 (even if 32 bit)? >>>>
    There are still people on the internet who swear that the 286 is
    better than sliced bread and refuse to recognize that modern
    architectures are superior.


    I was thinking, are there any segmented architectures today?

    Only in emulation (see Unisys Clearpath, for example).

    Although it's worth pointing out that harvard architectures
    still exist (e.g. CEVA DSPs)

    Yes, but Harvard architectures are a very different matter from
    segmented architectures. "Real" Harvard architecture processors have different instructions for accessing different memory spaces - such as
    on the AVR microcontrollers, the instructions for reading ram and
    reading program flash are totally different, and you cannot execute code
    from ram.

    Segmented architecture just means that the actual address is formed by a scaled segment register (or value) combined with an offset or pointer
    register (or value).

    There are plenty of segmented architectures in the world of small microcontrollers, where the "pointer" might be 8-bit, 16-bit, or a pair
    of 8-bit registers, and it is combined with a bank or segment register
    so that the device can use more than 64KB memory. These devices may or
    may not be Harvard. Fortunately, most of these are considered legacy
    devices.

    and the low-power ARM
    M-series core 32-bit physical address space is
    divided into 28-bit regions some of which may
    provide programmable windows into alternate address spaces
    in a fashion very similar to segmentation.


    All the ARM Cortex-M cores have 32-bit linear memory spaces. There is
    no segmentation. Different parts of the memory space are used for
    different purposes (ram, flash, peripherals, off-chip memory, etc.), and
    there can be lots of different memory-mapped devices placed at different points in the memory spaces. But all access is via 32-bit addresses in
    32-bit registers, without any segmentation registers. (And I have never
    seen a Cortex-M device with programmable windows or addresses - indeed,
    I believe the Cortex-M core documentation specifies some memory ranges explicitly. Memory protection units can be programmable to give
    different accesses, writes and cachability attributes to different
    regions, but that's another matter.)




    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Scott Lurndal@3:633/10 to All on Tue Nov 4 22:04:43 2025
    David Brown <david.brown@hesbynett.no> writes:
    On 04/11/2025 18:32, Scott Lurndal wrote:

    . (And I have never
    seen a Cortex-M device with programmable windows or addresses - indeed,
    I believe the Cortex-M core documentation specifies some memory ranges >explicitly.

    I have used Cortex-M devices with programmable windows
    in the physical address space.

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Tue Nov 4 22:17:44 2025
    On Tue, 4 Nov 2025 09:39:41 -0700, Peter Flass wrote:

    I was thinking, are there any segmented architectures today?

    Two different meanings of segmentation. It is possible to use segmentation
    in a flat address space, as a memory-management technique. Think paging,
    but with variable-length pages. (E.g. Burroughs machines did this. Also
    think of how program code on the old 680x0-based Macintosh machines could
    be divided up into individually-swappable ?CODE? segments.)

    The trouble was, such a scheme was prone to fragmentation, where the total free memory might be larger than the segment you want to load, but it?s
    broken up into discontiguous pieces that are too small to use. This is why paging was preferred instead.

    But now, with 64-bit architectures commonplace, you have multi-level page tables. Think of these as a form of segmentation, where each segment is
    made up of whole pages.

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Tue Nov 4 22:19:17 2025
    On Tue, 4 Nov 2025 12:15:27 -0500, geodandw wrote:

    The 68000 was a very nice processor for its time. It's too bad IBM
    didn't use it in the PC.

    Might have been a cost issue (more pins, more cost).

    In any case, the 680x0 family was very popular among Unix workstation
    vendors, until it was completely eclipsed in performance by the coming of RISC.

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Wed Nov 5 08:50:25 2025
    On 04/11/2025 23:04, Scott Lurndal wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 04/11/2025 18:32, Scott Lurndal wrote:

    . (And I have never
    seen a Cortex-M device with programmable windows or addresses - indeed,
    I believe the Cortex-M core documentation specifies some memory ranges
    explicitly.

    I have used Cortex-M devices with programmable windows
    in the physical address space.

    OK. I have not, but I haven't used the newer Cortex-M cores as yet, so
    it could well be a new feature. It could also be an option which the mainstream microcontroller manufacturers don't provide. Which ones have programmable windows? And is this something that will be common, or is
    it just something that a few manufacturers with "architect" (if that is
    the right term) ARM licenses implement on their own?

    (I know this is off-topic for c.l.c., but I'm interested in there devices.)


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Scott Lurndal@3:633/10 to All on Wed Nov 5 15:15:14 2025
    David Brown <david.brown@hesbynett.no> writes:
    On 04/11/2025 23:04, Scott Lurndal wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 04/11/2025 18:32, Scott Lurndal wrote:

    . (And I have never
    seen a Cortex-M device with programmable windows or addresses - indeed,
    I believe the Cortex-M core documentation specifies some memory ranges
    explicitly.

    I have used Cortex-M devices with programmable windows
    in the physical address space.

    OK. I have not, but I haven't used the newer Cortex-M cores as yet, so
    it could well be a new feature.

    It is not necessarily a feature of the M7 core itself, but rather
    the glue logic around it - particularly the logic that interfaces
    to the "system bus" to which the M7 core is interfaced. That logic
    is under the control of the SoC designer and can easily have
    external registers that are programmed to specify how to route
    accesses from the M7, including to large regions of DRAM;
    consider a maintenance processor on a 64-bit server that needs
    access to the server DRAM space for RAS purposes.



    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Thu Nov 6 08:51:37 2025
    On 05/11/2025 16:15, Scott Lurndal wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 04/11/2025 23:04, Scott Lurndal wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 04/11/2025 18:32, Scott Lurndal wrote:

    . (And I have never
    seen a Cortex-M device with programmable windows or addresses - indeed, >>>> I believe the Cortex-M core documentation specifies some memory ranges >>>> explicitly.

    I have used Cortex-M devices with programmable windows
    in the physical address space.

    OK. I have not, but I haven't used the newer Cortex-M cores as yet, so
    it could well be a new feature.

    It is not necessarily a feature of the M7 core itself, but rather
    the glue logic around it - particularly the logic that interfaces
    to the "system bus" to which the M7 core is interfaced. That logic
    is under the control of the SoC designer and can easily have
    external registers that are programmed to specify how to route
    accesses from the M7, including to large regions of DRAM;
    consider a maintenance processor on a 64-bit server that needs
    access to the server DRAM space for RAS purposes.


    Fair enough, now I see what you are getting at. Yes, once you are
    outside the Cortex-M core and key ARM-supplied components (like the
    interrupt controller), you as a SoC designer are free to do what you
    like. And if you have a 32-bit processor that needs access to a 64-bit address space, you are going to have to do some kind of windowing or segmenting.

    In the SoC's I have used where 64-bit Cortex-A processors are combined
    with a Cortex-M core for security purposes, booting, or for better
    real-time control of peripherals, the Cortex-M device does not have
    direct access to the 64-bit memory space. It has access to the
    peripherals, some dedicated memory, and a message-passing interface with
    the Cortex-A cores.

    But in your work, you probably see more variety and more possibilities
    for these things - I only get to use the chips someone else has made!


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From bart@3:633/10 to All on Thu Nov 6 11:21:06 2025
    On 06/11/2025 07:51, David Brown wrote:
    On 05/11/2025 16:15, Scott Lurndal wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 04/11/2025 23:04, Scott Lurndal wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 04/11/2025 18:32, Scott Lurndal wrote:

    .˙ (And I have never
    seen a Cortex-M device with programmable windows or addresses -
    indeed,
    I believe the Cortex-M core documentation specifies some memory ranges >>>>> explicitly.

    I have used Cortex-M devices with programmable windows
    in the physical address space.

    OK.˙ I have not, but I haven't used the newer Cortex-M cores as yet, so
    it could well be a new feature.

    It is not necessarily a feature of the M7 core itself, but rather
    the glue logic around it - particularly the logic that interfaces
    to the "system bus" to which the M7 core is interfaced.˙˙ That logic
    is under the control of the SoC designer and can easily have
    external registers that are programmed to specify how to route
    accesses from the M7, including to large regions of DRAM;
    consider a maintenance processor on a 64-bit server that needs
    access to the server DRAM space for RAS purposes.


    Fair enough, now I see what you are getting at.˙ Yes, once you are
    outside the Cortex-M core and key ARM-supplied components (like the interrupt controller), you as a SoC designer are free to do what you
    like.˙ And if you have a 32-bit processor that needs access to a 64-bit address space, you are going to have to do some kind of windowing or segmenting.

    In the SoC's I have used where 64-bit Cortex-A processors are combined
    with a Cortex-M core for security purposes, booting, or for better real- time control of peripherals, the Cortex-M device does not have direct
    access to the 64-bit memory space.˙ It has access to the peripherals,
    some dedicated memory, and a message-passing interface with the Cortex-A cores.

    But in your work, you probably see more variety and more possibilities
    for these things - I only get to use the chips someone else has made!


    I think you were right, if this 'M7' chip doesn't directly have
    registers, instructions or infrastructure to access the more complex
    memory system.

    Unless you are modifying M7 itself, then that 'glue' logic could be
    applied to anything (eg. I've built a Z80 system with 256KB RAM), and it
    is that composite system that a language + compiler can target.

    Then it would appear to the use of the language that the target machine
    had those extended features. But if they were to look at the generated
    code, they might see it was accessing external registers or whatever.

    So it's cheating.

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Thu Nov 6 13:56:17 2025
    On 06/11/2025 12:21, bart wrote:
    On 06/11/2025 07:51, David Brown wrote:
    On 05/11/2025 16:15, Scott Lurndal wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 04/11/2025 23:04, Scott Lurndal wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 04/11/2025 18:32, Scott Lurndal wrote:

    .˙ (And I have never
    seen a Cortex-M device with programmable windows or addresses -
    indeed,
    I believe the Cortex-M core documentation specifies some memory
    ranges
    explicitly.

    I have used Cortex-M devices with programmable windows
    in the physical address space.

    OK.˙ I have not, but I haven't used the newer Cortex-M cores as yet, so >>>> it could well be a new feature.

    It is not necessarily a feature of the M7 core itself, but rather
    the glue logic around it - particularly the logic that interfaces
    to the "system bus" to which the M7 core is interfaced.˙˙ That logic
    is under the control of the SoC designer and can easily have
    external registers that are programmed to specify how to route
    accesses from the M7, including to large regions of DRAM;
    consider a maintenance processor on a 64-bit server that needs
    access to the server DRAM space for RAS purposes.


    Fair enough, now I see what you are getting at.˙ Yes, once you are
    outside the Cortex-M core and key ARM-supplied components (like the
    interrupt controller), you as a SoC designer are free to do what you
    like.˙ And if you have a 32-bit processor that needs access to a
    64-bit address space, you are going to have to do some kind of
    windowing or segmenting.

    In the SoC's I have used where 64-bit Cortex-A processors are combined
    with a Cortex-M core for security purposes, booting, or for better
    real- time control of peripherals, the Cortex-M device does not have
    direct access to the 64-bit memory space.˙ It has access to the
    peripherals, some dedicated memory, and a message-passing interface
    with the Cortex-A cores.

    But in your work, you probably see more variety and more possibilities
    for these things - I only get to use the chips someone else has made!


    I think you were right, if this 'M7' chip doesn't directly have
    registers, instructions or infrastructure to access the more complex
    memory system.

    Unless you are modifying M7 itself, then that 'glue' logic could be
    applied to anything (eg. I've built a Z80 system with 256KB RAM), and it
    is that composite system that a language + compiler can target.

    Then it would appear to the use of the language that the target machine
    had those extended features. But if they were to look at the generated
    code, they might see it was accessing external registers or whatever.

    So it's cheating.

    You were fine up until the last sentence here. What do you mean by
    "cheating" ? Whose rules is it breaking? The system Scott was
    describing (assuming I understood him correctly) let the 32-bit core
    access blocks of the 64-bit address space. You can choose which part of
    the address space is accessible at any given time (presumably by
    accessing segment or window registers like any other memory-mapped
    peripheral registers). But you can't call it "cheating" unless you have defined some set of rules for what is "allowed" and what is not allowed,
    and everyone else has agreed to play by those rules.



    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Thu Nov 6 15:17:00 2025
    On Thu, 6 Nov 2025 13:56:17 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    On 06/11/2025 12:21, bart wrote:
    On 06/11/2025 07:51, David Brown wrote:
    On 05/11/2025 16:15, Scott Lurndal wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 04/11/2025 23:04, Scott Lurndal wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 04/11/2025 18:32, Scott Lurndal wrote:

    .? (And I have never
    seen a Cortex-M device with programmable windows or addresses
    - indeed,
    I believe the Cortex-M core documentation specifies some
    memory ranges
    explicitly.

    I have used Cortex-M devices with programmable windows
    in the physical address space.

    OK.? I have not, but I haven't used the newer Cortex-M cores as
    yet, so it could well be a new feature.

    It is not necessarily a feature of the M7 core itself, but rather
    the glue logic around it - particularly the logic that interfaces
    to the "system bus" to which the M7 core is interfaced.?? That
    logic is under the control of the SoC designer and can easily have
    external registers that are programmed to specify how to route
    accesses from the M7, including to large regions of DRAM;
    consider a maintenance processor on a 64-bit server that needs
    access to the server DRAM space for RAS purposes.


    Fair enough, now I see what you are getting at.? Yes, once you are
    outside the Cortex-M core and key ARM-supplied components (like
    the interrupt controller), you as a SoC designer are free to do
    what you like.? And if you have a 32-bit processor that needs
    access to a 64-bit address space, you are going to have to do some
    kind of windowing or segmenting.

    In the SoC's I have used where 64-bit Cortex-A processors are
    combined with a Cortex-M core for security purposes, booting, or
    for better real- time control of peripherals, the Cortex-M device
    does not have direct access to the 64-bit memory space.? It has
    access to the peripherals, some dedicated memory, and a
    message-passing interface with the Cortex-A cores.

    But in your work, you probably see more variety and more
    possibilities for these things - I only get to use the chips
    someone else has made!

    I think you were right, if this 'M7' chip doesn't directly have
    registers, instructions or infrastructure to access the more
    complex memory system.

    Unless you are modifying M7 itself, then that 'glue' logic could be applied to anything (eg. I've built a Z80 system with 256KB RAM),
    and it is that composite system that a language + compiler can
    target.

    Then it would appear to the use of the language that the target
    machine had those extended features. But if they were to look at
    the generated code, they might see it was accessing external
    registers or whatever.

    So it's cheating.

    You were fine up until the last sentence here. What do you mean by "cheating" ? Whose rules is it breaking? The system Scott was
    describing (assuming I understood him correctly) let the 32-bit core
    access blocks of the 64-bit address space. You can choose which part
    of the address space is accessible at any given time (presumably by accessing segment or window registers like any other memory-mapped peripheral registers). But you can't call it "cheating" unless you
    have defined some set of rules for what is "allowed" and what is not
    allowed, and everyone else has agreed to play by those rules.



    Doing this sort of tricks with Cortex-M7 is asking for trouble. Its
    data cache is unaware of tricks you play with windows, so programmer
    have to flush/invalidate cache lines manually. Sooner or later
    programmer will make mistake. A mistake of the sort that is very hard to
    debug.
    I'd say, if you (SOC designer) absolutely have to play these games, just
    use Cortex-M4.


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Thu Nov 6 15:56:12 2025
    On 06/11/2025 14:17, Michael S wrote:
    On Thu, 6 Nov 2025 13:56:17 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    On 06/11/2025 12:21, bart wrote:
    On 06/11/2025 07:51, David Brown wrote:
    On 05/11/2025 16:15, Scott Lurndal wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 04/11/2025 23:04, Scott Lurndal wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 04/11/2025 18:32, Scott Lurndal wrote:

    .˙ (And I have never
    seen a Cortex-M device with programmable windows or addresses
    - indeed,
    I believe the Cortex-M core documentation specifies some
    memory ranges
    explicitly.

    I have used Cortex-M devices with programmable windows
    in the physical address space.

    OK.˙ I have not, but I haven't used the newer Cortex-M cores as
    yet, so it could well be a new feature.

    It is not necessarily a feature of the M7 core itself, but rather
    the glue logic around it - particularly the logic that interfaces
    to the "system bus" to which the M7 core is interfaced.˙˙ That
    logic is under the control of the SoC designer and can easily have
    external registers that are programmed to specify how to route
    accesses from the M7, including to large regions of DRAM;
    consider a maintenance processor on a 64-bit server that needs
    access to the server DRAM space for RAS purposes.


    Fair enough, now I see what you are getting at.˙ Yes, once you are
    outside the Cortex-M core and key ARM-supplied components (like
    the interrupt controller), you as a SoC designer are free to do
    what you like.˙ And if you have a 32-bit processor that needs
    access to a 64-bit address space, you are going to have to do some
    kind of windowing or segmenting.

    In the SoC's I have used where 64-bit Cortex-A processors are
    combined with a Cortex-M core for security purposes, booting, or
    for better real- time control of peripherals, the Cortex-M device
    does not have direct access to the 64-bit memory space.˙ It has
    access to the peripherals, some dedicated memory, and a
    message-passing interface with the Cortex-A cores.

    But in your work, you probably see more variety and more
    possibilities for these things - I only get to use the chips
    someone else has made!

    I think you were right, if this 'M7' chip doesn't directly have
    registers, instructions or infrastructure to access the more
    complex memory system.

    Unless you are modifying M7 itself, then that 'glue' logic could be
    applied to anything (eg. I've built a Z80 system with 256KB RAM),
    and it is that composite system that a language + compiler can
    target.

    Then it would appear to the use of the language that the target
    machine had those extended features. But if they were to look at
    the generated code, they might see it was accessing external
    registers or whatever.

    So it's cheating.

    You were fine up until the last sentence here. What do you mean by
    "cheating" ? Whose rules is it breaking? The system Scott was
    describing (assuming I understood him correctly) let the 32-bit core
    access blocks of the 64-bit address space. You can choose which part
    of the address space is accessible at any given time (presumably by
    accessing segment or window registers like any other memory-mapped
    peripheral registers). But you can't call it "cheating" unless you
    have defined some set of rules for what is "allowed" and what is not
    allowed, and everyone else has agreed to play by those rules.



    Doing this sort of tricks with Cortex-M7 is asking for trouble.

    Scott is talking about specialised use of an M7 within a Cortex-A SoC
    that is itself rather specialised (I'm guessing it is embedded within a massive switch chip). The folks that program it are going to be within
    the same company as the folks that made the SoC, and it's reasonable to
    assume they know what they are doing. I agree that there can be gotchas
    here that could cause trouble for the average microcontroller programmer.

    Its
    data cache is unaware of tricks you play with windows, so programmer
    have to flush/invalidate cache lines manually.

    My guess is that the address range here would be marked uncacheable.

    Sooner or later
    programmer will make mistake. A mistake of the sort that is very hard to debug.

    I certainly agree that cache issues can be a challenge to debug, and if
    you don't understand what's going on, you can get very strange effects.
    Caches are something that you can often ignore when doing "normal"
    things, but if you are doing something unusual, you have to get the code
    right by design. You can't test your way to correct code, or use trial-and-error here!

    I'd say, if you (SOC designer) absolutely have to play these games, just
    use Cortex-M4.


    It's easy enough to make the memory area in question uncacheable, and
    then there is no problem.

    (I think it is likely that for the kind of uses such a device would
    have, such as running memory tests before starting the main system,
    caching is not helpful.)



    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Scott Lurndal@3:633/10 to All on Thu Nov 6 15:00:30 2025
    Michael S <already5chosen@yahoo.com> writes:
    On Thu, 6 Nov 2025 13:56:17 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    On 06/11/2025 12:21, bart wrote:
    On 06/11/2025 07:51, David Brown wrote: =20
    On 05/11/2025 16:15, Scott Lurndal wrote: =20
    David Brown <david.brown@hesbynett.no> writes: =20
    On 04/11/2025 23:04, Scott Lurndal wrote: =20
    David Brown <david.brown@hesbynett.no> writes: =20
    On 04/11/2025 18:32, Scott Lurndal wrote: =20
    =20
    .=A0 (And I have never
    seen a Cortex-M device with programmable windows or addresses
    - indeed,
    I believe the Cortex-M core documentation specifies some
    memory ranges
    explicitly. =20

    I have used Cortex-M devices with programmable windows
    in the physical address space. =20

    OK.=A0 I have not, but I haven't used the newer Cortex-M cores as
    yet, so it could well be a new feature. =20

    It is not necessarily a feature of the M7 core itself, but rather
    the glue logic around it - particularly the logic that interfaces
    to the "system bus" to which the M7 core is interfaced.=A0=A0 That
    logic is under the control of the SoC designer and can easily have
    external registers that are programmed to specify how to route
    accesses from the M7, including to large regions of DRAM;
    consider a maintenance processor on a 64-bit server that needs
    access to the server DRAM space for RAS purposes.
    =20

    Fair enough, now I see what you are getting at.=A0 Yes, once you are=20 >> >> outside the Cortex-M core and key ARM-supplied components (like
    the interrupt controller), you as a SoC designer are free to do
    what you like.=A0 And if you have a 32-bit processor that needs
    access to a 64-bit address space, you are going to have to do some
    kind of windowing or segmenting.

    In the SoC's I have used where 64-bit Cortex-A processors are
    combined with a Cortex-M core for security purposes, booting, or
    for better real- time control of peripherals, the Cortex-M device
    does not have direct access to the 64-bit memory space.=A0 It has
    access to the peripherals, some dedicated memory, and a
    message-passing interface with the Cortex-A cores.

    But in your work, you probably see more variety and more
    possibilities for these things - I only get to use the chips
    someone else has made!=20
    =20
    I think you were right, if this 'M7' chip doesn't directly have=20
    registers, instructions or infrastructure to access the more
    complex memory system.
    =20
    Unless you are modifying M7 itself, then that 'glue' logic could be=20
    applied to anything (eg. I've built a Z80 system with 256KB RAM),
    and it is that composite system that a language + compiler can
    target.
    =20
    Then it would appear to the use of the language that the target
    machine had those extended features. But if they were to look at
    the generated code, they might see it was accessing external
    registers or whatever.
    =20
    So it's cheating. =20
    =20
    You were fine up until the last sentence here. What do you mean by=20
    "cheating" ? Whose rules is it breaking? The system Scott was=20
    describing (assuming I understood him correctly) let the 32-bit core=20
    access blocks of the 64-bit address space. You can choose which part
    of the address space is accessible at any given time (presumably by=20
    accessing segment or window registers like any other memory-mapped=20
    peripheral registers). But you can't call it "cheating" unless you
    have defined some set of rules for what is "allowed" and what is not
    allowed, and everyone else has agreed to play by those rules.
    =20
    =20

    Doing this sort of tricks with Cortex-M7 is asking for trouble. Its
    data cache is unaware of tricks you play with windows, so programmer
    have to flush/invalidate cache lines manually.

    That is an inaccurate statement. The cache semantics are defined by
    the Cortex-M7 address map (see B.31 in DDI0403E) and use the appropriate
    AXI bus operations as required by the region and memory type
    registers.

    There is no intention or requirement for accesses to SoC DRAM by the M7 to be cache-coherent with respect to the application cores on the SoC. In
    any case, any region of the M7 address space can be specified as WT
    and non cached.


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Dan Cross@3:633/10 to All on Fri Nov 7 15:50:53 2025
    In article <10eda8d$3pd45$1@dont-email.me>,
    Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/4/25 08:20, Scott Lurndal wrote:
    Kaz Kylheku <643-408-1753@kylheku.com> writes:
    On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/3/25 13:24, Lynn McGuire wrote:

    When I saw this subject line, I thought it was some necroposting to
    threads from 1990.

    Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

    There are still people on the internet who swear that the 286 is
    better than sliced bread and refuse to recognize that modern
    architectures are superior.


    I was thinking, are there any segmented architectures today? Most
    disguise segmentation as a flat address space (e.g. IBM System/370 et.seq.)

    x86_64 is still nominally segmented; what "code segment" the
    processor is running in matters, even in long mode. But most of
    the segment data is ignored by hardware (e.g., base and limits)
    in 64-bit mode.

    Of course, it retains a notion of segmentation for a) 16- and
    32-bit code compatibility, and b) startup, where the processor
    (still!!) comes out of reset in 16-bit real mode.

    Intel had a proposal to do away with 16-bit mode and anything
    other than long mode for 64-bit, but it seems to have died. So
    it seems like we'll be stuck with x86 segmentation --- at least
    for compatibility purposes --- for a while longer still.

    - Dan C.


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Scott Lurndal@3:633/10 to All on Fri Nov 7 16:08:54 2025
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <10eda8d$3pd45$1@dont-email.me>,
    Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/4/25 08:20, Scott Lurndal wrote:
    Kaz Kylheku <643-408-1753@kylheku.com> writes:
    On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/3/25 13:24, Lynn McGuire wrote:

    When I saw this subject line, I thought it was some necroposting to
    threads from 1990.

    Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

    There are still people on the internet who swear that the 286 is
    better than sliced bread and refuse to recognize that modern
    architectures are superior.


    I was thinking, are there any segmented architectures today? Most
    disguise segmentation as a flat address space (e.g. IBM System/370 et.seq.)

    x86_64 is still nominally segmented; what "code segment" the
    processor is running in matters, even in long mode. But most of
    the segment data is ignored by hardware (e.g., base and limits)
    in 64-bit mode.

    Minor correction, an update to AMD64 was done back in
    the oughts to support some segment limit registers for 64-bit XEN
    (and probably for vmware as well).

    See the LMSLE bit in the EFER register for more details.

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Paul S Person@3:633/10 to All on Fri Nov 7 08:22:11 2025
    On Fri, 7 Nov 2025 15:50:53 -0000 (UTC), cross@spitfire.i.gajendra.net
    (Dan Cross) wrote:

    In article <10eda8d$3pd45$1@dont-email.me>,
    Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/4/25 08:20, Scott Lurndal wrote:
    Kaz Kylheku <643-408-1753@kylheku.com> writes:
    On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/3/25 13:24, Lynn McGuire wrote:

    When I saw this subject line, I thought it was some necroposting to
    threads from 1990.

    Someone still cared about segmented x86 shit in 2010 (even if 32
    bit)?

    There are still people on the internet who swear that the 286 is
    better than sliced bread and refuse to recognize that modern
    architectures are superior.


    I was thinking, are there any segmented architectures today? Most
    disguise segmentation as a flat address space (e.g. IBM System/370
    et.seq.)

    x86_64 is still nominally segmented; what "code segment" the
    processor is running in matters, even in long mode. But most of
    the segment data is ignored by hardware (e.g., base and limits)
    in 64-bit mode.

    Of course, it retains a notion of segmentation for a) 16- and
    32-bit code compatibility, and b) startup, where the processor
    (still!!) comes out of reset in 16-bit real mode.

    Intel had a proposal to do away with 16-bit mode and anything
    other than long mode for 64-bit, but it seems to have died. So
    it seems like we'll be stuck with x86 segmentation --- at least
    for compatibility purposes --- for a while longer still.

    This is all very interesting as a summary of where-we-are. Thanks.

    Didn't Intel, at one time, plan to replace all xxx8x processors with
    one of the new! shiny! RISC processor?

    Only to be defeated when it was pointed out that a whole lot of
    software would have to run on it. Software written for their xxx8x
    processors, segmentation and all.
    --
    "Here lies the Tuscan poet Aretino,
    Who evil spoke of everyone but God,
    Giving as his excuse, 'I never knew him.'"

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Dan Cross@3:633/10 to All on Fri Nov 7 16:46:56 2025
    In article <10edcbg$lrh1$1@dont-email.me>,
    geodandw <geodandw@gmail.com> wrote:
    On 11/4/25 12:12, Richard Heathfield wrote:
    On 04/11/2025 15:20, Scott Lurndal wrote:
    Kaz Kylheku <643-408-1753@kylheku.com> writes:
    On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/3/25 13:24, Lynn McGuire wrote:

    When I saw this subject line, I thought it was some necroposting to
    threads from 1990.

    Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

    There are still people on the internet who swear that the 286 is
    better than sliced bread and refuse to recognize that modern
    architectures are superior.

    I can still hear them down the hall.

    ST!
    .......................................................Amiga!
    ST!
    .......................................................Amiga!

    The 68000 was a very nice processor for its time. It's too bad IBM
    didn't use it in the PC.

    They wanted to. IBM had a close relationship with Motorola, and
    they even had engineering samples in Westchester. The problem
    was that 68k was a skunkworks project inside of Moto, which was
    pushing the 6809 as the Next Big Thing. So when IBM was talking
    to Moto sales about using 68k for the PC, Moto was pushing them
    (not so gently) towards the 6809 and telling them 68k was just a
    research project with no future.

    IBM was smart enough to know that the 6809 was going to be a
    non-starter (a firmly 8-bit micro when 16-bit CPUs were becoming
    mainstream), and the 8088 met their specs for the 5150, so they
    went with Intel instead. By the time it was clear that the 68k
    was going to be Moto's flagship CPU going forward, it was too
    late for inclusion in the PC.

    And here we are.

    - Dan C.


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Dan Cross@3:633/10 to All on Fri Nov 7 16:54:44 2025
    In article <qQoPQ.1134549$p8E9.400952@fx18.iad>,
    Scott Lurndal <slp53@pacbell.net> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <10eda8d$3pd45$1@dont-email.me>,
    Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/4/25 08:20, Scott Lurndal wrote:
    Kaz Kylheku <643-408-1753@kylheku.com> writes:
    On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/3/25 13:24, Lynn McGuire wrote:

    When I saw this subject line, I thought it was some necroposting to
    threads from 1990.

    Someone still cared about segmented x86 shit in 2010 (even if 32 bit)? >>>>
    There are still people on the internet who swear that the 286 is
    better than sliced bread and refuse to recognize that modern
    architectures are superior.


    I was thinking, are there any segmented architectures today? Most >>>disguise segmentation as a flat address space (e.g. IBM System/370 et.seq.) >>
    x86_64 is still nominally segmented; what "code segment" the
    processor is running in matters, even in long mode. But most of
    the segment data is ignored by hardware (e.g., base and limits)
    in 64-bit mode.

    Minor correction, an update to AMD64 was done back in
    the oughts to support some segment limit registers for 64-bit XEN
    (and probably for vmware as well).

    See the LMSLE bit in the EFER register for more details.

    Interesting. AMD-only, not Intel.

    This is why we can't have nice things.

    - Dan C.


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Dan Cross@3:633/10 to All on Fri Nov 7 17:22:51 2025
    In article <7r6sgktd4p0ae1e3p97hc7h89nloaldbrj@4ax.com>,
    Paul S Person <psperson@old.netcom.invalid> wrote:
    On Fri, 7 Nov 2025 15:50:53 -0000 (UTC), cross@spitfire.i.gajendra.net
    (Dan Cross) wrote:

    In article <10eda8d$3pd45$1@dont-email.me>,
    Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/4/25 08:20, Scott Lurndal wrote:
    Kaz Kylheku <643-408-1753@kylheku.com> writes:
    On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/3/25 13:24, Lynn McGuire wrote:

    When I saw this subject line, I thought it was some necroposting to
    threads from 1990.

    Someone still cared about segmented x86 shit in 2010 (even if 32 bit)? >>>>
    There are still people on the internet who swear that the 286 is
    better than sliced bread and refuse to recognize that modern
    architectures are superior.


    I was thinking, are there any segmented architectures today? Most >>>disguise segmentation as a flat address space (e.g. IBM System/370 et.seq.) >>
    x86_64 is still nominally segmented; what "code segment" the
    processor is running in matters, even in long mode. But most of
    the segment data is ignored by hardware (e.g., base and limits)
    in 64-bit mode.

    Of course, it retains a notion of segmentation for a) 16- and
    32-bit code compatibility, and b) startup, where the processor
    (still!!) comes out of reset in 16-bit real mode.

    Intel had a proposal to do away with 16-bit mode and anything
    other than long mode for 64-bit, but it seems to have died. So
    it seems like we'll be stuck with x86 segmentation --- at least
    for compatibility purposes --- for a while longer still.

    This is all very interesting as a summary of where-we-are. Thanks.

    Didn't Intel, at one time, plan to replace all xxx8x processors with
    one of the new! shiny! RISC processor?

    Well, Itanium was going to sweep all that came before it into
    the dustbin of history.

    Only to be defeated when it was pointed out that a whole lot of
    software would have to run on it. Software written for their xxx8x >processors, segmentation and all.

    Nah, that wasn't that big of an issue. By then, segmentation
    was already mostly legacy and systems that really relied on it
    had been designed in an era of slow CPUs that could be emulated
    in software if you really needed them for installed base
    compatibility.

    The heyday of x86 segmentation was really over by 1985. The
    80386 was intended to be a processor for the Unix workstation
    market, and supported a paged, flat 32-bit address space. They
    shoehorned that into the segmented model by a) increasing the
    size of the segment limit and b) adding the "granularity" bit in
    segment descriptors that allowed segments to be defined in units
    of 4KiB, rather than single bytes. The upshot was that a
    segment could cover the full 32-bit virtual address space; so
    the intended use case was that OSes would set up a couple 4GiB
    segments at boot, point the segmentation registers at those, and
    then work in terms of the paged virtual address space after
    that; all of the nasty pre-386 segment stuff would be relegated
    to a relatively small part of the system. So most software that
    really used the segmentation stuff had been written for the 286
    or earlier, when CPUs were pretty slow and pokey, making
    emulation a reasonable path for backwards compatibility.

    The bigger problem for Itanium was that, in order to really
    perform well, they needed super-smart compilers that could do
    the instruction scheduling needed to take advantage of its VLIW
    architecture. Those never came, and so the realized performance
    just wasn't there relative to the promises Intel had made for
    the architecture. Meanwhile, a bunch of ex-DEC people went to
    AMD and did the AMD64 extensions for x86, which a) performed
    pretty decently (at a much lower price-point than Itanium), and
    b) was directly backwards compatible with 32-bit x86. Within,
    what, a year or so, Intel had no choice but to copy the design
    with their own offering, and the market ran with it.

    This was all in the last 90s/early 2000s, but by 2003 or there
    abouts it was clear that Itanium was never going to reach its
    promises.

    Speculating about alternate historical timelines is always fun.
    Had IBM chosen the 68k two decades before Itanium, I suspect the
    world would be very different: Moto didn't try to push that
    design beyond the 68060, which was competitive with the Pentium
    at roughly the same time, but didn't have a pipelined FPU;
    perhaps it would if Moto had had the kind of capital resources
    Intel could bring to bear for Pentium and beyond. I suspect,
    however, that Moto would have dumped the 68k architecture and
    we'd all be using some kind of RISC ISA directly.

    One final note about Itanium: Intel had tried the VLIW thing
    before with the i860, and ran into the same problem: the
    compilers of that era just weren't there to make it competitive
    for general-purpose compute. You'd think they'd have learned
    that lesson for Itanium, and either done the compiler work
    themselves, or funded it externally, _before_ betting so big on
    it.

    - Dan C.


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Scott Lurndal@3:633/10 to All on Fri Nov 7 17:43:25 2025
    Paul S Person <psperson@old.netcom.invalid> writes:
    On Fri, 7 Nov 2025 15:50:53 -0000 (UTC), cross@spitfire.i.gajendra.net
    (Dan Cross) wrote:

    Intel had a proposal to do away with 16-bit mode and anything
    other than long mode for 64-bit, but it seems to have died. So
    it seems like we'll be stuck with x86 segmentation --- at least
    for compatibility purposes --- for a while longer still.

    This is all very interesting as a summary of where-we-are. Thanks.

    Dan was referring to https://www.intel.com/content/www/us/en/developer/articles/technical/envisioning-future-simplified-architecture.html


    Didn't Intel, at one time, plan to replace all xxx8x processors with
    one of the new! shiny! RISC processor?

    The EPIC[*] processor family (Monterey, Merced) known now as Itanium
    was intended to be Intel's replacement for the x86 server grade
    processors, replacing the proposed P7[**] design. It was an epic failure, primarily due to cost, compiler complexity and lack of competitive performance.

    [*] Explicitly Parallel Instruction Computing.

    [**] Which was a RISC-like processor. Unfortunately, there's not much
    information about the original P7 design on the internet, and I
    wasn't allowed to keep my P7 orange books.


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Fri Nov 7 19:40:56 2025
    On Fri, 07 Nov 2025 08:22:11 -0800, Paul S Person wrote:

    Didn't Intel, at one time, plan to replace all xxx8x processors with
    one of the new! shiny! RISC processor?

    They tried twice, and failed both times. The first time was the i860 <https://www.youtube.com/watch?v=WTkFGZqVCM8>.

    The second, better-known failure was in conjunction with HP <https://www.youtube.com/watch?v=3oxrybkd7Mo>.

    Only to be defeated when it was pointed out that a whole lot of
    software would have to run on it. Software written for their xxx8x processors, segmentation and all.

    In terms of software compatibility, open-source platforms like Linux
    have shown that that does not need to be a barrier to innovation at
    all.

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Dan Cross@3:633/10 to All on Sat Nov 8 00:00:06 2025
    In article <20251106151700.00006730@yahoo.com>,
    Michael S <already5chosen@yahoo.com> wrote:
    On Thu, 6 Nov 2025 13:56:17 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    On 06/11/2025 12:21, bart wrote:
    On 06/11/2025 07:51, David Brown wrote:
    On 05/11/2025 16:15, Scott Lurndal wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 04/11/2025 23:04, Scott Lurndal wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 04/11/2025 18:32, Scott Lurndal wrote:

    .? (And I have never
    seen a Cortex-M device with programmable windows or addresses
    - indeed,
    I believe the Cortex-M core documentation specifies some
    memory ranges
    explicitly.

    I have used Cortex-M devices with programmable windows
    in the physical address space.

    OK.? I have not, but I haven't used the newer Cortex-M cores as
    yet, so it could well be a new feature.

    It is not necessarily a feature of the M7 core itself, but rather
    the glue logic around it - particularly the logic that interfaces
    to the "system bus" to which the M7 core is interfaced.?? That
    logic is under the control of the SoC designer and can easily have
    external registers that are programmed to specify how to route
    accesses from the M7, including to large regions of DRAM;
    consider a maintenance processor on a 64-bit server that needs
    access to the server DRAM space for RAS purposes.


    Fair enough, now I see what you are getting at.? Yes, once you are
    outside the Cortex-M core and key ARM-supplied components (like
    the interrupt controller), you as a SoC designer are free to do
    what you like.? And if you have a 32-bit processor that needs
    access to a 64-bit address space, you are going to have to do some
    kind of windowing or segmenting.

    In the SoC's I have used where 64-bit Cortex-A processors are
    combined with a Cortex-M core for security purposes, booting, or
    for better real- time control of peripherals, the Cortex-M device
    does not have direct access to the 64-bit memory space.? It has
    access to the peripherals, some dedicated memory, and a
    message-passing interface with the Cortex-A cores.

    But in your work, you probably see more variety and more
    possibilities for these things - I only get to use the chips
    someone else has made!

    I think you were right, if this 'M7' chip doesn't directly have
    registers, instructions or infrastructure to access the more
    complex memory system.

    Unless you are modifying M7 itself, then that 'glue' logic could be
    applied to anything (eg. I've built a Z80 system with 256KB RAM),
    and it is that composite system that a language + compiler can
    target.

    Then it would appear to the use of the language that the target
    machine had those extended features. But if they were to look at
    the generated code, they might see it was accessing external
    registers or whatever.

    So it's cheating.

    You were fine up until the last sentence here. What do you mean by
    "cheating" ? Whose rules is it breaking? The system Scott was
    describing (assuming I understood him correctly) let the 32-bit core
    access blocks of the 64-bit address space. You can choose which part
    of the address space is accessible at any given time (presumably by
    accessing segment or window registers like any other memory-mapped
    peripheral registers). But you can't call it "cheating" unless you
    have defined some set of rules for what is "allowed" and what is not
    allowed, and everyone else has agreed to play by those rules.

    Doing this sort of tricks with Cortex-M7 is asking for trouble. Its
    data cache is unaware of tricks you play with windows, so programmer
    have to flush/invalidate cache lines manually. Sooner or later
    programmer will make mistake.

    As Scott already said, he's not in the same cache coherency
    domain as the A-profile cores, so it doesn't really matter and
    the memory map defines the cache attributes of these aperture
    regions appropriately. However, I want to point out that on
    _any_ relaxed memory architecture CPU, the programmer already
    has to be aware of these issues and deal with them accordingly.
    E.g, consider implementing a context switch or mutex or
    someth8ing.

    A mistake of the sort that is very hard to debug.

    Welcome to 2025.

    I'd say, if you (SOC designer) absolutely have to play these games, just
    use Cortex-M4.

    Sometimes you really do need an M7 class part.

    - Dan C.


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Peter Flass@3:633/10 to All on Sat Nov 8 08:45:04 2025
    On 11/7/25 08:50, Dan Cross wrote:
    In article <10eda8d$3pd45$1@dont-email.me>,
    Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/4/25 08:20, Scott Lurndal wrote:
    Kaz Kylheku <643-408-1753@kylheku.com> writes:
    On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/3/25 13:24, Lynn McGuire wrote:

    When I saw this subject line, I thought it was some necroposting to
    threads from 1990.

    Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

    There are still people on the internet who swear that the 286 is
    better than sliced bread and refuse to recognize that modern
    architectures are superior.


    I was thinking, are there any segmented architectures today? Most
    disguise segmentation as a flat address space (e.g. IBM System/370 et.seq.)

    x86_64 is still nominally segmented; what "code segment" the
    processor is running in matters, even in long mode. But most of
    the segment data is ignored by hardware (e.g., base and limits)
    in 64-bit mode.

    Of course, it retains a notion of segmentation for a) 16- and
    32-bit code compatibility, and b) startup, where the processor
    (still!!) comes out of reset in 16-bit real mode.

    Intel had a proposal to do away with 16-bit mode and anything
    other than long mode for 64-bit, but it seems to have died. So
    it seems like we'll be stuck with x86 segmentation --- at least
    for compatibility purposes --- for a while longer still.

    - Dan C.


    Probably at least until the 128-bit systems arrive ;-)

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Peter Flass@3:633/10 to All on Sat Nov 8 08:47:11 2025
    On 11/7/25 09:46, Dan Cross wrote:
    In article <10edcbg$lrh1$1@dont-email.me>,
    geodandw <geodandw@gmail.com> wrote:
    On 11/4/25 12:12, Richard Heathfield wrote:
    On 04/11/2025 15:20, Scott Lurndal wrote:
    Kaz Kylheku <643-408-1753@kylheku.com> writes:
    On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/3/25 13:24, Lynn McGuire wrote:

    When I saw this subject line, I thought it was some necroposting to
    threads from 1990.

    Someone still cared about segmented x86 shit in 2010 (even if 32 bit)? >>>>
    There are still people on the internet who swear that the 286 is
    better than sliced bread and refuse to recognize that modern
    architectures are superior.

    I can still hear them down the hall.

    ST!
    .......................................................Amiga!
    ST!
    .......................................................Amiga!

    The 68000 was a very nice processor for its time. It's too bad IBM
    didn't use it in the PC.

    They wanted to. IBM had a close relationship with Motorola, and
    they even had engineering samples in Westchester. The problem
    was that 68k was a skunkworks project inside of Moto, which was
    pushing the 6809 as the Next Big Thing. So when IBM was talking
    to Moto sales about using 68k for the PC, Moto was pushing them
    (not so gently) towards the 6809 and telling them 68k was just a
    research project with no future.

    IBM was smart enough to know that the 6809 was going to be a
    non-starter (a firmly 8-bit micro when 16-bit CPUs were becoming
    mainstream), and the 8088 met their specs for the 5150, so they
    went with Intel instead. By the time it was clear that the 68k
    was going to be Moto's flagship CPU going forward, it was too
    late for inclusion in the PC.

    And here we are.

    - Dan C.


    I think they used the 680x0 in one of their small computers. Maybe the "Laboratory Computer"?

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From John Levine@3:633/10 to All on Sat Nov 8 21:17:05 2025
    According to Peter Flass <Peter@Iron-Spring.com>:
    I think they used the 680x0 in one of their small computers. Maybe the >"Laboratory Computer"?

    That was IBM Instruments, a small company that IBM bought after they'd
    already developed the product and just rebadged it.


    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Sun Nov 9 11:15:01 2025
    On Fri, 7 Nov 2025 17:22:51 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) wrote:

    the architecture. Meanwhile, a bunch of ex-DEC people went to
    AMD and did the AMD64 extensions for x86, which a) performed

    Do you have a proof that it was done by Ex-DEC people?
    My impression is that Ex-DEC people, esp. Jim Keller, were very
    important as micro-architects of K7 and K8, but I don't remember ever
    reading that they played major role in the stage of architectural
    definitions of AMD64.



    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Sun Nov 9 11:46:00 2025
    On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) wrote:


    I'd say, if you (SOC designer) absolutely have to play these games,
    just use Cortex-M4.

    Sometimes you really do need an M7 class part.

    - Dan C.


    Somehow I suspect that [at the same clock frequency] M4 could access
    uncached memory faster that M7. May be, even significantly faster.

    Unfortunately, info about M7 instructions timing does not appear to be
    public.

    If one needs something like DP floating or when uncached accesses are
    only small part of the job and the rest of the load is compute
    -intensive then I can see how M7 could look attractive vs M4.
    But personally in such case I'd start to look for non-Cortex-M solution.
    May be R4, although I don't like it. May be A5. In huge SoCs of sort
    Scott is working on - A34 or even 510. Plus, another M4 to handle more
    typical MCU tasks.





    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Sun Nov 9 12:29:32 2025
    On 09/11/2025 10:46, Michael S wrote:
    On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) wrote:


    I'd say, if you (SOC designer) absolutely have to play these games,
    just use Cortex-M4.

    Sometimes you really do need an M7 class part.

    - Dan C.


    Somehow I suspect that [at the same clock frequency] M4 could access
    uncached memory faster that M7. May be, even significantly faster.


    I suspect you would be wrong. The M7 can do more per clock than the M4,
    has wider buses, and has support for direct data and instruction
    memories with their own dedicated buses. I can appreciate the gut
    feeling that because there is the option of caching accesses, that extra functionality may slow down accesses when the cache is not used, but I
    don't believe that happens on the M7. And everything other than the
    accesses themselves (the loads, stores, address increments, looping,
    etc.) can be quite a lot faster at the same clock speed.

    But as you say, public data on timings is limited - and even when the
    data on the core is available, timings can be very dependent on details
    of the implementation and connections outside the core.

    We could always appeal to authority - Scott's company knows what they
    are doing, have access to far more detailed information and technical assistance from ARM than we do, and have picked an M7 rather than an M4.
    But speculation is more fun :-)

    Unfortunately, info about M7 instructions timing does not appear to be public.

    If one needs something like DP floating or when uncached accesses are
    only small part of the job and the rest of the load is compute
    -intensive then I can see how M7 could look attractive vs M4.
    But personally in such case I'd start to look for non-Cortex-M solution.
    May be R4, although I don't like it. May be A5. In huge SoCs of sort
    Scott is working on - A34 or even 510. Plus, another M4 to handle more typical MCU tasks.






    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Sun Nov 9 14:40:31 2025
    On Sun, 9 Nov 2025 12:29:32 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    On 09/11/2025 10:46, Michael S wrote:
    On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) wrote:


    I'd say, if you (SOC designer) absolutely have to play these
    games, just use Cortex-M4.

    Sometimes you really do need an M7 class part.

    - Dan C.


    Somehow I suspect that [at the same clock frequency] M4 could access uncached memory faster that M7. May be, even significantly faster.


    I suspect you would be wrong. The M7 can do more per clock than the
    M4, has wider buses, and has support for direct data and instruction memories with their own dedicated buses.

    If I am not mistaken, with exception of caches, M4 and M7 have
    3 identical "fast" 32-bit busses - I+D+AHB. Plus some slower auxiliary
    stuff.

    I can appreciate the gut
    feeling that because there is the option of caching accesses, that
    extra functionality may slow down accesses when the cache is not
    used, but I don't believe that happens on the M7. And everything
    other than the accesses themselves (the loads, stores, address
    increments, looping, etc.) can be quite a lot faster at the same
    clock speed.

    Except that every branch mispredict is more than twice slower. I'd
    guess that the latency of the cache/TCM *hit* is also 1 clock slower
    that latency of internal SRAM access on M4, but absence of docs
    prevents me from proving it.
    As to cache miss, I am pretty sure that it completely stalls M7
    pipeline. In case of M4, I think that after external Load pipeline makes
    one more step before it stalls. And, of course, the stall itself is less expensive.
    Once again, I can't prove it because of absence of docs.


    But as you say, public data on timings is limited -

    In case of M7, public data is not "limited", it is absent.
    AFAIK, it's not the case for all other Cortex-M cores. Back when M7 was
    new, Arm claimed that the data is not made available because the core
    is more complicated that the rest of Cortex-M line. As silly as it
    sounds they could continue to claim it with sort of straight face for as
    long as other Cortex-M cores were, indeed, simpler. Which is not the
    case since 2022, because Cortex M85 is no less complicated than M7 and
    arguably even a little more so. Despite that, there exist M85 Software Optimization Guide that contains instruction tables with latency and
    throughput data. Yes, it has few omissions, but it proves that there is
    nothing impossible in documenting cores of this level of complexity,
    even if you as lazy as Cortex M documentation team appears to be
    (relatively, for example, to Cortex-A/Neoverse side of the company).

    and even when the
    data on the core is available, timings can be very dependent on
    details of the implementation and connections outside the core.

    We could always appeal to authority - Scott's company knows what they
    are doing, have access to far more detailed information and technical assistance from ARM than we do, and have picked an M7 rather than an
    M4. But speculation is more fun :-)

    Unfortunately, info about M7 instructions timing does not appear to
    be public.

    If one needs something like DP floating or when uncached accesses
    are only small part of the job and the rest of the load is compute -intensive then I can see how M7 could look attractive vs M4.
    But personally in such case I'd start to look for non-Cortex-M
    solution. May be R4, although I don't like it. May be A5. In huge
    SoCs of sort Scott is working on - A34 or even 510. Plus, another
    M4 to handle more typical MCU tasks.








    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Sun Nov 9 15:54:20 2025
    On 09/11/2025 13:40, Michael S wrote:
    On Sun, 9 Nov 2025 12:29:32 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    On 09/11/2025 10:46, Michael S wrote:
    On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) wrote:


    I'd say, if you (SOC designer) absolutely have to play these
    games, just use Cortex-M4.

    Sometimes you really do need an M7 class part.

    - Dan C.


    Somehow I suspect that [at the same clock frequency] M4 could access
    uncached memory faster that M7. May be, even significantly faster.


    I suspect you would be wrong. The M7 can do more per clock than the
    M4, has wider buses, and has support for direct data and instruction
    memories with their own dedicated buses.

    If I am not mistaken, with exception of caches, M4 and M7 have
    3 identical "fast" 32-bit busses - I+D+AHB. Plus some slower auxiliary
    stuff.


    I believe you are mistaken (which is not something I have seen often).

    <https://www.arm.com/-/media/Arm%20Developer%20Community/PDF/Processor%20Datasheets/Arm-Cortex-M7-Processor-Datasheet.pdf>

    """
    The interfaces that the processor supports include:
    64-bit AXI4 interface
    32-bit AHB master interface
    32-bit AHB slave interface
    64-bit instruction TCM interface
    2x32-bit data TCM interfaces
    """

    The M7 is dual issue - for some instruction combinations, it runs two instructions per clock. It needs more, faster and wider buses to feed it.

    I can appreciate the gut
    feeling that because there is the option of caching accesses, that
    extra functionality may slow down accesses when the cache is not
    used, but I don't believe that happens on the M7. And everything
    other than the accesses themselves (the loads, stores, address
    increments, looping, etc.) can be quite a lot faster at the same
    clock speed.

    Except that every branch mispredict is more than twice slower.

    Branch mispredict costs are primarily related to pipeline depth on a
    processor that does not do any kind of speculative execution. I don't remember the depth of the M4 and M7 off-hand, but the M7 is not twice as
    deep as the M4.

    I'd
    guess that the latency of the cache/TCM *hit* is also 1 clock slower
    that latency of internal SRAM access on M4, but absence of docs
    prevents me from proving it.

    The whole point of the TCM - tightly coupled memories - is that they run
    at core speed, and no caches are used. They are as low-latency as can
    be achieved with M4 sram, except that now you have independent buses and memory for instruction and data (rather than independent buses to shared memory if you have code and data in ram on most M4 implementations), and
    that the buses are twice as wide.

    It is possible that there is an extra cycle of latency on accessing main memory, due to the optional path through the cache - I am not sure on
    that. But I suspect that the 64-bit wide AXI4 bus, as well as the significantly faster handling of the rest of the code (which does not
    need to share the same bus bandwidth as the off-core memory accesses)
    more than outweighs that.

    As to cache miss, I am pretty sure that it completely stalls M7
    pipeline.

    Yes. But we are not using the cache in this hypothetical case.

    In case of M4, I think that after external Load pipeline makes
    one more step before it stalls. And, of course, the stall itself is less expensive.
    Once again, I can't prove it because of absence of docs.


    But as you say, public data on timings is limited -

    In case of M7, public data is not "limited", it is absent.
    AFAIK, it's not the case for all other Cortex-M cores. Back when M7 was
    new, Arm claimed that the data is not made available because the core
    is more complicated that the rest of Cortex-M line. As silly as it
    sounds they could continue to claim it with sort of straight face for as
    long as other Cortex-M cores were, indeed, simpler. Which is not the
    case since 2022, because Cortex M85 is no less complicated than M7 and arguably even a little more so. Despite that, there exist M85 Software Optimization Guide that contains instruction tables with latency and throughput data. Yes, it has few omissions, but it proves that there is nothing impossible in documenting cores of this level of complexity,
    even if you as lazy as Cortex M documentation team appears to be
    (relatively, for example, to Cortex-A/Neoverse side of the company).

    and even when the
    data on the core is available, timings can be very dependent on
    details of the implementation and connections outside the core.

    We could always appeal to authority - Scott's company knows what they
    are doing, have access to far more detailed information and technical
    assistance from ARM than we do, and have picked an M7 rather than an
    M4. But speculation is more fun :-)

    Unfortunately, info about M7 instructions timing does not appear to
    be public.

    If one needs something like DP floating or when uncached accesses
    are only small part of the job and the rest of the load is compute
    -intensive then I can see how M7 could look attractive vs M4.
    But personally in such case I'd start to look for non-Cortex-M
    solution. May be R4, although I don't like it. May be A5. In huge
    SoCs of sort Scott is working on - A34 or even 510. Plus, another
    M4 to handle more typical MCU tasks.









    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Sun Nov 9 17:50:51 2025
    On Sun, 9 Nov 2025 15:54:20 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    On 09/11/2025 13:40, Michael S wrote:
    On Sun, 9 Nov 2025 12:29:32 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    On 09/11/2025 10:46, Michael S wrote:
    On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) wrote:


    I'd say, if you (SOC designer) absolutely have to play these
    games, just use Cortex-M4.

    Sometimes you really do need an M7 class part.

    - Dan C.


    Somehow I suspect that [at the same clock frequency] M4 could
    access uncached memory faster that M7. May be, even significantly
    faster.

    I suspect you would be wrong. The M7 can do more per clock than
    the M4, has wider buses, and has support for direct data and
    instruction memories with their own dedicated buses.

    If I am not mistaken, with exception of caches, M4 and M7 have
    3 identical "fast" 32-bit busses - I+D+AHB. Plus some slower
    auxiliary stuff.


    I believe you are mistaken (which is not something I have seen often).

    <https://www.arm.com/-/media/Arm%20Developer%20Community/PDF/Processor%20Datasheets/Arm-Cortex-M7-Processor-Datasheet.pdf>

    """
    The interfaces that the processor supports include:
    64-bit AXI4 interface
    32-bit AHB master interface
    32-bit AHB slave interface
    64-bit instruction TCM interface
    2x32-bit data TCM interfaces
    """


    Yes, I was mistaken. I overlooked AXIM/AXI4.

    The M7 is dual issue - for some instruction combinations, it runs two instructions per clock. It needs more, faster and wider buses to
    feed it.

    I can appreciate the gut
    feeling that because there is the option of caching accesses, that
    extra functionality may slow down accesses when the cache is not
    used, but I don't believe that happens on the M7. And everything
    other than the accesses themselves (the loads, stores, address
    increments, looping, etc.) can be quite a lot faster at the same
    clock speed.

    Except that every branch mispredict is more than twice slower.

    Branch mispredict costs are primarily related to pipeline depth on a processor that does not do any kind of speculative execution.

    Same as on most of those those that do speculative execution. But
    that's O.T.

    I
    don't remember the depth of the M4 and M7 off-hand, but the M7 is not
    twice as deep as the M4.


    It is twice as deep. 6 vs 3. Which means that typical mispredict penalty differs by factor of 2.5 (5 vs 2).


    I'd
    guess that the latency of the cache/TCM *hit* is also 1 clock slower
    that latency of internal SRAM access on M4, but absence of docs
    prevents me from proving it.

    The whole point of the TCM - tightly coupled memories - is that they
    run at core speed, and no caches are used. They are as low-latency
    as can be achieved with M4 sram, except that now you have independent
    buses and memory for instruction and data (rather than independent
    buses to shared memory if you have code and data in ram on most M4 implementations), and that the buses are twice as wide.


    Look at the pipelines.
    We have no official pipeline picture for M7, but we can guess with
    good certainty that it is very similar to M85, with main difference
    being that M85 has 3 LS stages and M7 has only 2.
    It is obvious that in the best possible case Load instruction and
    dependent Integer Data Processing (DPU) instruction have to be 2 cycles
    apart, i.e. minimum load to DPU latency = 3. On M3/M4 minimal latency =
    2.

    It is possible that there is an extra cycle of latency on accessing
    main memory, due to the optional path through the cache - I am not
    sure on that. But I suspect that the 64-bit wide AXI4 bus, as well
    as the significantly faster handling of the rest of the code (which
    does not need to share the same bus bandwidth as the off-core memory accesses) more than outweighs that.

    64-bit bus certainly helps a lot for cached accesses. Not sure if it
    helps uncached accesses. I'd guess that [for uncached) it does not help
    regular integer Load instructions, but sometime helps LDM. It also
    likely helps DP FP load instruction when the core configured with DP
    FPU.
    As to sharing the same bus bandwith, both M4 and M7 have dedicated
    I-bus. In typical MCU it is connected to NOR flash and here M7 Icache
    helps a lot. In typical big ASIC it is connected to fast SRAM and
    Icache makes no difference.


    As to cache miss, I am pretty sure that it completely stalls M7
    pipeline.

    Yes. But we are not using the cache in this hypothetical case.


    As far as pipeline goes, uncached access is the same as D-cache miss.
    Except that after data finally arrived it does not have to be written
    to cache, but for load-to-use latency the latter is irrelevant.

    May be, it is one clock better when M7 is configured without data cache,
    which is possible and fully supported by ARM, but probably not very
    popular among their clients. Or, may be, it's not better.

    On soft Nios2-f core which is M7-class core I am most familiar with,
    uncached configuration does help, but internals of soft cores are,
    well ... more soft.





    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Brown@3:633/10 to All on Sun Nov 9 18:05:15 2025
    On 09/11/2025 16:50, Michael S wrote:
    On Sun, 9 Nov 2025 15:54:20 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    On 09/11/2025 13:40, Michael S wrote:
    On Sun, 9 Nov 2025 12:29:32 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    On 09/11/2025 10:46, Michael S wrote:
    On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) wrote:


    I'd say, if you (SOC designer) absolutely have to play these
    games, just use Cortex-M4.

    Sometimes you really do need an M7 class part.

    - Dan C.


    Somehow I suspect that [at the same clock frequency] M4 could
    access uncached memory faster that M7. May be, even significantly
    faster.

    I suspect you would be wrong. The M7 can do more per clock than
    the M4, has wider buses, and has support for direct data and
    instruction memories with their own dedicated buses.

    If I am not mistaken, with exception of caches, M4 and M7 have
    3 identical "fast" 32-bit busses - I+D+AHB. Plus some slower
    auxiliary stuff.


    I believe you are mistaken (which is not something I have seen often).

    <https://www.arm.com/-/media/Arm%20Developer%20Community/PDF/Processor%20Datasheets/Arm-Cortex-M7-Processor-Datasheet.pdf>

    """
    The interfaces that the processor supports include:
    64-bit AXI4 interface
    32-bit AHB master interface
    32-bit AHB slave interface
    64-bit instruction TCM interface
    2x32-bit data TCM interfaces
    """


    Yes, I was mistaken. I overlooked AXIM/AXI4.

    The M7 is dual issue - for some instruction combinations, it runs two
    instructions per clock. It needs more, faster and wider buses to
    feed it.

    I can appreciate the gut
    feeling that because there is the option of caching accesses, that
    extra functionality may slow down accesses when the cache is not
    used, but I don't believe that happens on the M7. And everything
    other than the accesses themselves (the loads, stores, address
    increments, looping, etc.) can be quite a lot faster at the same
    clock speed.

    Except that every branch mispredict is more than twice slower.

    Branch mispredict costs are primarily related to pipeline depth on a
    processor that does not do any kind of speculative execution.

    Same as on most of those those that do speculative execution. But
    that's O.T.

    To some extent, I agree, though speculative execution can make the costs
    more complicated (such as by using compute units that would otherwise be
    used for useful work, or by speculative memory accesses that use up real bandwidth). But the details are OT - even by the OT standard of this
    thread.


    I
    don't remember the depth of the M4 and M7 off-hand, but the M7 is not
    twice as deep as the M4.


    It is twice as deep. 6 vs 3. Which means that typical mispredict penalty differs by factor of 2.5 (5 vs 2).


    Okay, that's a bigger pipeline difference than I thought. However, it
    is also good to remember that the M7 has much more sophisticated branch prediction than the M4, so mispredicts will be fewer on average. Worst
    case is going to be worse, however.


    I'd
    guess that the latency of the cache/TCM *hit* is also 1 clock slower
    that latency of internal SRAM access on M4, but absence of docs
    prevents me from proving it.

    The whole point of the TCM - tightly coupled memories - is that they
    run at core speed, and no caches are used. They are as low-latency
    as can be achieved with M4 sram, except that now you have independent
    buses and memory for instruction and data (rather than independent
    buses to shared memory if you have code and data in ram on most M4
    implementations), and that the buses are twice as wide.


    Look at the pipelines.
    We have no official pipeline picture for M7, but we can guess with
    good certainty that it is very similar to M85, with main difference
    being that M85 has 3 LS stages and M7 has only 2.
    It is obvious that in the best possible case Load instruction and
    dependent Integer Data Processing (DPU) instruction have to be 2 cycles apart, i.e. minimum load to DPU latency = 3. On M3/M4 minimal latency =
    2.

    It is possible that there is an extra cycle of latency on accessing
    main memory, due to the optional path through the cache - I am not
    sure on that. But I suspect that the 64-bit wide AXI4 bus, as well
    as the significantly faster handling of the rest of the code (which
    does not need to share the same bus bandwidth as the off-core memory
    accesses) more than outweighs that.

    64-bit bus certainly helps a lot for cached accesses. Not sure if it
    helps uncached accesses.

    64 bit buses work well with the TCMs. And if you are writing fast code
    for the M7, you run the code from the ITCM and have most of your data
    (and your stack) in the DTCM.

    As for uncached access off core, 64-bit might help for some things, but
    you are right that it might not help as much as when using the cache.

    I'd guess that [for uncached) it does not help
    regular integer Load instructions, but sometime helps LDM. It also
    likely helps DP FP load instruction when the core configured with DP
    FPU.
    As to sharing the same bus bandwith, both M4 and M7 have dedicated
    I-bus. In typical MCU it is connected to NOR flash and here M7 Icache
    helps a lot. In typical big ASIC it is connected to fast SRAM and
    Icache makes no difference.


    As to cache miss, I am pretty sure that it completely stalls M7
    pipeline.

    Yes. But we are not using the cache in this hypothetical case.


    As far as pipeline goes, uncached access is the same as D-cache miss.
    Except that after data finally arrived it does not have to be written
    to cache, but for load-to-use latency the latter is irrelevant.

    There are a few other differences. In particular, cache misses
    typically mean a whole cache line is read in, whether the access is read
    or write. With uncached accesses, that does not happen.


    May be, it is one clock better when M7 is configured without data cache, which is possible and fully supported by ARM, but probably not very
    popular among their clients. Or, may be, it's not better.

    On soft Nios2-f core which is M7-class core I am most familiar with,
    uncached configuration does help, but internals of soft cores are,
    well ... more soft.


    And they are also perhaps better documented :-)


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Scott Lurndal@3:633/10 to All on Sun Nov 9 21:58:10 2025
    Michael S <already5chosen@yahoo.com> writes:
    On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) wrote:


    I'd say, if you (SOC designer) absolutely have to play these games,
    just use Cortex-M4.

    Sometimes you really do need an M7 class part.

    - Dan C.


    Somehow I suspect that [at the same clock frequency] M4 could access
    uncached memory faster that M7. May be, even significantly faster.

    I don't see it. In both cases, they'll be dependent upon the
    performance of the system interconnect, plus the m7 has multiple
    load units, while the m4 has only one.


    Unfortunately, info about M7 instructions timing does not appear to be >public.

    As noted above, it's interconnect dependent.


    If one needs something like DP floating or when uncached accesses are
    only small part of the job and the rest of the load is compute
    -intensive then I can see how M7 could look attractive vs M4.

    I think you need to provide more data supporting this vis-a-vis
    M4 and M7 performance characteristics.

    But personally in such case I'd start to look for non-Cortex-M solution.
    May be R4, although I don't like it. May be A5. In huge SoCs of sort
    Scott is working on - A34 or even 510. Plus, another M4 to handle more >typical MCU tasks.

    If you think that the system designers don't take into account the
    capabilities of the cores that they select based on the workload
    assigned to those cores, you would be incorrect.

    There are significant performance advantages to using the m7 vs. the m4, particularly in load/store performance (given the M7 has two load units,
    vs only one in the M4) and branch performance.


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Dan Cross@3:633/10 to All on Mon Nov 10 09:08:58 2025
    In article <20251109111501.00000fcd@yahoo.com>,
    Michael S <already5chosen@yahoo.com> wrote:
    On Fri, 7 Nov 2025 17:22:51 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) wrote:

    the architecture. Meanwhile, a bunch of ex-DEC people went to
    AMD and did the AMD64 extensions for x86, which a) performed

    Do you have a proof that it was done by Ex-DEC people?
    My impression is that Ex-DEC people, esp. Jim Keller, were very
    important as micro-architects of K7 and K8, but I don't remember ever
    reading that they played major role in the stage of architectural
    definitions of AMD64.

    I'm afraid I do not. I may be incorrect on that part,
    or misattributing IP transfer from the cross-licensing
    agreement that came out of thr DEC/Intel lawsuit to
    sprcific engineers.

    - Dan C.


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)