Forum: d0p3 BBS

Re: 16:32 far pointers in OpenWatcom C/C++

From Peter Flass@3:633/10 to All on Sun Nov 2 12:57:51 2025

On 3/25/10 03:02, Nick Keighley wrote:

On 24 Mar, 23:40, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
wrote:

Dann Corbit <dcor...@connx.com> writes:

In article <1e27d5ee-a1b1-45d9-9188-
63ab37398...@d37g2000yqn.googlegroups.com>,
nick_keighley_nos...@hotmail.com says...

On 23 Mar, 20:56, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

In alt.sys.pdp10 Richard Bos <ralt...@xs4all.nl> wrote:
(snip)

That crossposting was, for once, not asinine. It served as a nice
example why, even now, Leenux weenies are not correct when they insist >>>>>> that C has a flat memory model and all pointers are just numbers.

Well, you could also read the C standard to learn that.

but if you say that you get accused of language lawyering.
"Since IBM stopped making 360s no C program ever needs to run on such
a platform"

We have customers who are running their business on harware from the mid >>> 1980s. �It may sound ludicrous, but it if solves all of their business
needs, and runs solid 24x365, why should they upgrade?

Because they could run an equivalently computationally powerful
solution with various levels of redundancy and fail-over protection,
with a power budget sensibly measured in mere Watts?

does it have a Coral compiler?

There's a market for someone.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Peter Flass@3:633/10 to All on Sun Nov 2 13:08:25 2025

On 3/23/10 13:56, glen herrmannsfeldt wrote:

In alt.sys.pdp10 Richard Bos <raltbos@xs4all.nl> wrote:
(snip)

That crossposting was, for once, not asinine. It served as a nice
example why, even now, Leenux weenies are not correct when they insist
that C has a flat memory model and all pointers are just numbers.

This is true often enough to be dangerous when it turns out not to be.

Well, you could also read the C standard to learn that.

There are additional complications for C on the PDP-10.

-- glen

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Peter Flass@3:633/10 to All on Sun Nov 2 13:20:14 2025

What happened here?? I just noticed that a lot of these posts are from
2010. Did some news server just barf?

On 3/23/10 14:42, Peter Flass wrote:

Branimir Maksimovic wrote:

On Tue, 23 Mar 2010 06:51:18 -0400
Peter Flass <Peter_Flass@Yahoo.com> wrote:

Jonathan de Boyne Pollard wrote:

Returning to what we were talking about before the silly diversion,
I should point out that 32-bit applications programming where the
target is extended DOS or 32-bit Win16 (with OpenWatcom's extender)
will also occasionally employ 16:32 far pointers of course.� But as
I said before, regular 32-bit OS/2 or Win32 applications
programming generally does not, since those both use the Tiny
memory model,

Flat memory model.

Problem with standard C and C++ is that they assume flat memory
model.

I'm not a C expert, perhaps you're a denizen of comp.lang.c, but as far
as I know there's nothing in the C standard that assumes anything about pointers, except that they have to be the same size as int, so for 16:32 pointers I guess you'd need 64-bit ints.

As far as implementations are concerned, both Watcom and IBM VA C++
support segmented memory models.� These are the ones I'm aware of, there
are probably more.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lynn McGuire@3:633/10 to All on Mon Nov 3 14:24:27 2025

On 11/2/2025 2:20 PM, Peter Flass wrote:

What happened here?? I just noticed that a lot of these posts are from
2010. Did some news server just barf?

On 3/23/10 14:42, Peter Flass wrote:

Branimir Maksimovic wrote:

On Tue, 23 Mar 2010 06:51:18 -0400
Peter Flass <Peter_Flass@Yahoo.com> wrote:

Jonathan de Boyne Pollard wrote:

Returning to what we were talking about before the silly diversion,
I should point out that 32-bit applications programming where the
target is extended DOS or 32-bit Win16 (with OpenWatcom's extender)
will also occasionally employ 16:32 far pointers of course.� But as
I said before, regular 32-bit OS/2 or Win32 applications
programming generally does not, since those both use the Tiny
memory model,

Flat memory model.

Problem with standard C and C++ is that they assume flat memory
model.

I'm not a C expert, perhaps you're a denizen of comp.lang.c, but as
far as I know there's nothing in the C standard that assumes anything
about pointers, except that they have to be the same size as int, so
for 16:32 pointers I guess you'd need 64-bit ints.

As far as implementations are concerned, both Watcom and IBM VA C++
support segmented memory models.� These are the ones I'm aware of,
there are probably more.

I asked Ray Banana of E-S about the openwatcom.* groups and he
resurrected them with all of their very old postings.

Lynn

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Peter Flass@3:633/10 to All on Mon Nov 3 16:25:06 2025

On 11/3/25 13:24, Lynn McGuire wrote:

On 11/2/2025 2:20 PM, Peter Flass wrote:

What happened here?? I just noticed that a lot of these posts are from
2010. Did some news server just barf?

On 3/23/10 14:42, Peter Flass wrote:

Branimir Maksimovic wrote:

On Tue, 23 Mar 2010 06:51:18 -0400
Peter Flass <Peter_Flass@Yahoo.com> wrote:

Jonathan de Boyne Pollard wrote:

Returning to what we were talking about before the silly diversion, >>>>>> I should point out that 32-bit applications programming where the
target is extended DOS or 32-bit Win16 (with OpenWatcom's extender) >>>>>> will also occasionally employ 16:32 far pointers of course.� But as >>>>>> I said before, regular 32-bit OS/2 or Win32 applications
programming generally does not, since those both use the Tiny
memory model,

Flat memory model.

Problem with standard C and C++ is that they assume flat memory
model.

I'm not a C expert, perhaps you're a denizen of comp.lang.c, but as
far as I know there's nothing in the C standard that assumes anything
about pointers, except that they have to be the same size as int, so
for 16:32 pointers I guess you'd need 64-bit ints.

As far as implementations are concerned, both Watcom and IBM VA C++
support segmented memory models.� These are the ones I'm aware of,
there are probably more.

I asked Ray Banana of E-S about the openwatcom.* groups and he
resurrected them with all of their very old postings.

Lynn

Oh, OK. Also everything that was X-posted. No worries.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Kaz Kylheku@3:633/10 to All on Tue Nov 4 00:26:40 2025

On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/3/25 13:24, Lynn McGuire wrote:

When I saw this subject line, I thought it was some necroposting to
threads from 1990.

Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

Amazing ...

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Tue Nov 4 15:20:41 2025

Kaz Kylheku <643-408-1753@kylheku.com> writes:

On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/3/25 13:24, Lynn McGuire wrote:

When I saw this subject line, I thought it was some necroposting to
threads from 1990.

Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Paul S Person@3:633/10 to All on Tue Nov 4 08:29:21 2025

On Tue, 4 Nov 2025 00:26:40 -0000 (UTC), Kaz Kylheku
<643-408-1753@kylheku.com> wrote:

On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/3/25 13:24, Lynn McGuire wrote:

When I saw this subject line, I thought it was some necroposting to
threads from 1990.

Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

Amazing ...

I'm not sure about today, but that late there were still people
programming on older hardware for various specialized purposes. Or
rather, I suppose, maintaining the code. (I would say "devices" but
that now implies "something that runs Apps" and these were much much
older).

One of the advantages of Watcom (and so OpenWatcom) has always been
support for 16-bit programming. Of course, whether that is true today
is hard to say.
--
"Here lies the Tuscan poet Aretino,
Who evil spoke of everyone but God,
Giving as his excuse, 'I never knew him.'"

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Paul S Person@3:633/10 to All on Tue Nov 4 08:32:47 2025

On Mon, 3 Nov 2025 14:24:27 -0600, Lynn McGuire
<lynnmcguire5@gmail.com> wrote:

On 11/2/2025 2:20 PM, Peter Flass wrote:

What happened here?? I just noticed that a lot of these posts are from

2010. Did some news server just barf?

<snippo>

I asked Ray Banana of E-S about the openwatcom.* groups and he
resurrected them with all of their very old postings.

I tested the OpenWatcom Usenet server, and Agent reported no response.

So the groups still exist (and, I suspect, not just on E-S), but not
at the source.
--
"Here lies the Tuscan poet Aretino,
Who evil spoke of everyone but God,
Giving as his excuse, 'I never knew him.'"

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Peter Flass@3:633/10 to All on Tue Nov 4 09:39:41 2025

On 11/4/25 08:20, Scott Lurndal wrote:

Kaz Kylheku <643-408-1753@kylheku.com> writes:

On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/3/25 13:24, Lynn McGuire wrote:

When I saw this subject line, I thought it was some necroposting to
threads from 1990.

Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.

I was thinking, are there any segmented architectures today? Most
disguise segmentation as a flat address space (e.g. IBM System/370 et.seq.)

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Richard Heathfield@3:633/10 to All on Tue Nov 4 17:12:46 2025

On 04/11/2025 15:20, Scott Lurndal wrote:

Kaz Kylheku <643-408-1753@kylheku.com> writes:

On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/3/25 13:24, Lynn McGuire wrote:

When I saw this subject line, I thought it was some necroposting to
threads from 1990.

Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.

I can still hear them down the hall.

ST!
.......................................................Amiga!
ST!
.......................................................Amiga!

--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Tue Nov 4 17:14:01 2025

Peter Flass <Peter@Iron-Spring.com> writes:

On 11/4/25 08:20, Scott Lurndal wrote:

Kaz Kylheku <643-408-1753@kylheku.com> writes:

On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/3/25 13:24, Lynn McGuire wrote:

When I saw this subject line, I thought it was some necroposting to
threads from 1990.

Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.

I was thinking, are there any segmented architectures today?

Only in emulation (see Unisys Clearpath, for example).

Most
disguise segmentation as a flat address space (e.g. IBM System/370 et.seq.)

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From geodandw@3:633/10 to All on Tue Nov 4 12:15:27 2025

On 11/4/25 12:12, Richard Heathfield wrote:

On 04/11/2025 15:20, Scott Lurndal wrote:

Kaz Kylheku <643-408-1753@kylheku.com> writes:

On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/3/25 13:24, Lynn McGuire wrote:

When I saw this subject line, I thought it was some necroposting to
threads from 1990.

Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.

I can still hear them down the hall.

ST!
.......................................................Amiga!
ST!
.......................................................Amiga!

The 68000 was a very nice processor for its time. It's too bad IBM
didn't use it in the PC.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Kaz Kylheku@3:633/10 to All on Tue Nov 4 17:21:31 2025

On 2025-11-04, geodandw <geodandw@gmail.com> wrote:

The 68000 was a very nice processor for its time. It's too bad IBM
didn't use it in the PC.

That would have been so much better even if it still had had a shitty
CP/M-like OS with drive letter names and whatnot.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Tue Nov 4 17:32:24 2025

scott@slp53.sl.home (Scott Lurndal) writes:

Peter Flass <Peter@Iron-Spring.com> writes:

On 11/4/25 08:20, Scott Lurndal wrote:

Kaz Kylheku <643-408-1753@kylheku.com> writes:

On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/3/25 13:24, Lynn McGuire wrote:

When I saw this subject line, I thought it was some necroposting to
threads from 1990.

Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.

I was thinking, are there any segmented architectures today?

Only in emulation (see Unisys Clearpath, for example).

Although it's worth pointing out that harvard architectures
still exist (e.g. CEVA DSPs) and the low-power ARM
M-series core 32-bit physical address space is
divided into 28-bit regions some of which may
provide programmable windows into alternate address spaces
in a fashion very similar to segmentation.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Kaz Kylheku@3:633/10 to All on Tue Nov 4 17:38:28 2025

On 2025-11-04, Scott Lurndal <scott@slp53.sl.home> wrote:

scott@slp53.sl.home (Scott Lurndal) writes:

Peter Flass <Peter@Iron-Spring.com> writes:

On 11/4/25 08:20, Scott Lurndal wrote:

Kaz Kylheku <643-408-1753@kylheku.com> writes:

On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/3/25 13:24, Lynn McGuire wrote:

When I saw this subject line, I thought it was some necroposting to
threads from 1990.

Someone still cared about segmented x86 shit in 2010 (even if 32 bit)? >>>>

There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.

I was thinking, are there any segmented architectures today?

Only in emulation (see Unisys Clearpath, for example).

Although it's worth pointing out that harvard architectures
still exist (e.g. CEVA DSPs) and the low-power ARM

Ah, that. I worked with the TeakLite III.

In addition to the hardvard thing, I remember its smallest addressable
is was 16 bits, From the host processor (ARM) in that SoC, it appeared
to have "funny endian": 32 bit words written by the TeakLite appeared
in 2143 order or something, not 1234 or 4321.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Tue Nov 4 21:23:44 2025

On 04/11/2025 18:32, Scott Lurndal wrote:

scott@slp53.sl.home (Scott Lurndal) writes:

Peter Flass <Peter@Iron-Spring.com> writes:

On 11/4/25 08:20, Scott Lurndal wrote:

Kaz Kylheku <643-408-1753@kylheku.com> writes:

On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/3/25 13:24, Lynn McGuire wrote:

When I saw this subject line, I thought it was some necroposting to
threads from 1990.

Someone still cared about segmented x86 shit in 2010 (even if 32 bit)? >>>>

There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.

I was thinking, are there any segmented architectures today?

Only in emulation (see Unisys Clearpath, for example).

Although it's worth pointing out that harvard architectures
still exist (e.g. CEVA DSPs)

Yes, but Harvard architectures are a very different matter from
segmented architectures. "Real" Harvard architecture processors have different instructions for accessing different memory spaces - such as
on the AVR microcontrollers, the instructions for reading ram and
reading program flash are totally different, and you cannot execute code
from ram.

Segmented architecture just means that the actual address is formed by a scaled segment register (or value) combined with an offset or pointer
register (or value).

There are plenty of segmented architectures in the world of small microcontrollers, where the "pointer" might be 8-bit, 16-bit, or a pair
of 8-bit registers, and it is combined with a bank or segment register
so that the device can use more than 64KB memory. These devices may or
may not be Harvard. Fortunately, most of these are considered legacy
devices.

and the low-power ARM
M-series core 32-bit physical address space is
divided into 28-bit regions some of which may
provide programmable windows into alternate address spaces
in a fashion very similar to segmentation.

All the ARM Cortex-M cores have 32-bit linear memory spaces. There is
no segmentation. Different parts of the memory space are used for
different purposes (ram, flash, peripherals, off-chip memory, etc.), and
there can be lots of different memory-mapped devices placed at different points in the memory spaces. But all access is via 32-bit addresses in
32-bit registers, without any segmentation registers. (And I have never
seen a Cortex-M device with programmable windows or addresses - indeed,
I believe the Cortex-M core documentation specifies some memory ranges explicitly. Memory protection units can be programmable to give
different accesses, writes and cachability attributes to different
regions, but that's another matter.)

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Tue Nov 4 22:04:43 2025

David Brown <david.brown@hesbynett.no> writes:

On 04/11/2025 18:32, Scott Lurndal wrote:

. (And I have never
seen a Cortex-M device with programmable windows or addresses - indeed,
I believe the Cortex-M core documentation specifies some memory ranges >explicitly.

I have used Cortex-M devices with programmable windows
in the physical address space.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Tue Nov 4 22:17:44 2025

On Tue, 4 Nov 2025 09:39:41 -0700, Peter Flass wrote:

I was thinking, are there any segmented architectures today?

Two different meanings of segmentation. It is possible to use segmentation
in a flat address space, as a memory-management technique. Think paging,
but with variable-length pages. (E.g. Burroughs machines did this. Also
think of how program code on the old 680x0-based Macintosh machines could
be divided up into individually-swappable ?CODE? segments.)

The trouble was, such a scheme was prone to fragmentation, where the total free memory might be larger than the segment you want to load, but it?s
broken up into discontiguous pieces that are too small to use. This is why paging was preferred instead.

But now, with 64-bit architectures commonplace, you have multi-level page tables. Think of these as a form of segmentation, where each segment is
made up of whole pages.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Tue Nov 4 22:19:17 2025

On Tue, 4 Nov 2025 12:15:27 -0500, geodandw wrote:

The 68000 was a very nice processor for its time. It's too bad IBM
didn't use it in the PC.

Might have been a cost issue (more pins, more cost).

In any case, the 680x0 family was very popular among Unix workstation
vendors, until it was completely eclipsed in performance by the coming of RISC.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Wed Nov 5 08:50:25 2025

On 04/11/2025 23:04, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/11/2025 18:32, Scott Lurndal wrote:

. (And I have never
seen a Cortex-M device with programmable windows or addresses - indeed,
I believe the Cortex-M core documentation specifies some memory ranges
explicitly.

I have used Cortex-M devices with programmable windows
in the physical address space.

OK. I have not, but I haven't used the newer Cortex-M cores as yet, so
it could well be a new feature. It could also be an option which the mainstream microcontroller manufacturers don't provide. Which ones have programmable windows? And is this something that will be common, or is
it just something that a few manufacturers with "architect" (if that is
the right term) ARM licenses implement on their own?

(I know this is off-topic for c.l.c., but I'm interested in there devices.)

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Wed Nov 5 15:15:14 2025

David Brown <david.brown@hesbynett.no> writes:

On 04/11/2025 23:04, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/11/2025 18:32, Scott Lurndal wrote:

. (And I have never
seen a Cortex-M device with programmable windows or addresses - indeed,
I believe the Cortex-M core documentation specifies some memory ranges
explicitly.

I have used Cortex-M devices with programmable windows
in the physical address space.

OK. I have not, but I haven't used the newer Cortex-M cores as yet, so
it could well be a new feature.

It is not necessarily a feature of the M7 core itself, but rather
the glue logic around it - particularly the logic that interfaces
to the "system bus" to which the M7 core is interfaced. That logic
is under the control of the SoC designer and can easily have
external registers that are programmed to specify how to route
accesses from the M7, including to large regions of DRAM;
consider a maintenance processor on a 64-bit server that needs
access to the server DRAM space for RAS purposes.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Thu Nov 6 08:51:37 2025

On 05/11/2025 16:15, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/11/2025 23:04, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/11/2025 18:32, Scott Lurndal wrote:

. (And I have never
seen a Cortex-M device with programmable windows or addresses - indeed, >>>> I believe the Cortex-M core documentation specifies some memory ranges >>>> explicitly.

I have used Cortex-M devices with programmable windows
in the physical address space.

OK. I have not, but I haven't used the newer Cortex-M cores as yet, so
it could well be a new feature.

It is not necessarily a feature of the M7 core itself, but rather
the glue logic around it - particularly the logic that interfaces
to the "system bus" to which the M7 core is interfaced. That logic
is under the control of the SoC designer and can easily have
external registers that are programmed to specify how to route
accesses from the M7, including to large regions of DRAM;
consider a maintenance processor on a 64-bit server that needs
access to the server DRAM space for RAS purposes.

Fair enough, now I see what you are getting at. Yes, once you are
outside the Cortex-M core and key ARM-supplied components (like the
interrupt controller), you as a SoC designer are free to do what you
like. And if you have a 32-bit processor that needs access to a 64-bit address space, you are going to have to do some kind of windowing or segmenting.

In the SoC's I have used where 64-bit Cortex-A processors are combined
with a Cortex-M core for security purposes, booting, or for better
real-time control of peripherals, the Cortex-M device does not have
direct access to the 64-bit memory space. It has access to the
peripherals, some dedicated memory, and a message-passing interface with
the Cortex-A cores.

But in your work, you probably see more variety and more possibilities
for these things - I only get to use the chips someone else has made!

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From bart@3:633/10 to All on Thu Nov 6 11:21:06 2025

On 06/11/2025 07:51, David Brown wrote:

On 05/11/2025 16:15, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/11/2025 23:04, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/11/2025 18:32, Scott Lurndal wrote:

.� (And I have never
seen a Cortex-M device with programmable windows or addresses -
indeed,
I believe the Cortex-M core documentation specifies some memory ranges >>>>> explicitly.

I have used Cortex-M devices with programmable windows
in the physical address space.

OK.� I have not, but I haven't used the newer Cortex-M cores as yet, so
it could well be a new feature.

It is not necessarily a feature of the M7 core itself, but rather
the glue logic around it - particularly the logic that interfaces
to the "system bus" to which the M7 core is interfaced.�� That logic
is under the control of the SoC designer and can easily have
external registers that are programmed to specify how to route
accesses from the M7, including to large regions of DRAM;
consider a maintenance processor on a 64-bit server that needs
access to the server DRAM space for RAS purposes.

Fair enough, now I see what you are getting at.� Yes, once you are
outside the Cortex-M core and key ARM-supplied components (like the interrupt controller), you as a SoC designer are free to do what you
like.� And if you have a 32-bit processor that needs access to a 64-bit address space, you are going to have to do some kind of windowing or segmenting.

In the SoC's I have used where 64-bit Cortex-A processors are combined
with a Cortex-M core for security purposes, booting, or for better real- time control of peripherals, the Cortex-M device does not have direct
access to the 64-bit memory space.� It has access to the peripherals,
some dedicated memory, and a message-passing interface with the Cortex-A cores.

But in your work, you probably see more variety and more possibilities
for these things - I only get to use the chips someone else has made!

I think you were right, if this 'M7' chip doesn't directly have
registers, instructions or infrastructure to access the more complex
memory system.

Unless you are modifying M7 itself, then that 'glue' logic could be
applied to anything (eg. I've built a Z80 system with 256KB RAM), and it
is that composite system that a language + compiler can target.

Then it would appear to the use of the language that the target machine
had those extended features. But if they were to look at the generated
code, they might see it was accessing external registers or whatever.

So it's cheating.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Thu Nov 6 13:56:17 2025

On 06/11/2025 12:21, bart wrote:

On 06/11/2025 07:51, David Brown wrote:

On 05/11/2025 16:15, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/11/2025 23:04, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/11/2025 18:32, Scott Lurndal wrote:

.� (And I have never
seen a Cortex-M device with programmable windows or addresses -
indeed,
I believe the Cortex-M core documentation specifies some memory
ranges
explicitly.

I have used Cortex-M devices with programmable windows
in the physical address space.

OK.� I have not, but I haven't used the newer Cortex-M cores as yet, so >>>> it could well be a new feature.

It is not necessarily a feature of the M7 core itself, but rather
the glue logic around it - particularly the logic that interfaces
to the "system bus" to which the M7 core is interfaced.�� That logic
is under the control of the SoC designer and can easily have
external registers that are programmed to specify how to route
accesses from the M7, including to large regions of DRAM;
consider a maintenance processor on a 64-bit server that needs
access to the server DRAM space for RAS purposes.

Fair enough, now I see what you are getting at.� Yes, once you are
outside the Cortex-M core and key ARM-supplied components (like the
interrupt controller), you as a SoC designer are free to do what you
like.� And if you have a 32-bit processor that needs access to a
64-bit address space, you are going to have to do some kind of
windowing or segmenting.

In the SoC's I have used where 64-bit Cortex-A processors are combined
with a Cortex-M core for security purposes, booting, or for better
real- time control of peripherals, the Cortex-M device does not have
direct access to the 64-bit memory space.� It has access to the
peripherals, some dedicated memory, and a message-passing interface
with the Cortex-A cores.

But in your work, you probably see more variety and more possibilities
for these things - I only get to use the chips someone else has made!

I think you were right, if this 'M7' chip doesn't directly have
registers, instructions or infrastructure to access the more complex
memory system.

Unless you are modifying M7 itself, then that 'glue' logic could be
applied to anything (eg. I've built a Z80 system with 256KB RAM), and it
is that composite system that a language + compiler can target.

Then it would appear to the use of the language that the target machine
had those extended features. But if they were to look at the generated
code, they might see it was accessing external registers or whatever.

So it's cheating.

You were fine up until the last sentence here. What do you mean by
"cheating" ? Whose rules is it breaking? The system Scott was
describing (assuming I understood him correctly) let the 32-bit core
access blocks of the 64-bit address space. You can choose which part of
the address space is accessible at any given time (presumably by
accessing segment or window registers like any other memory-mapped
peripheral registers). But you can't call it "cheating" unless you have defined some set of rules for what is "allowed" and what is not allowed,
and everyone else has agreed to play by those rules.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Thu Nov 6 15:17:00 2025

On Thu, 6 Nov 2025 13:56:17 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 06/11/2025 12:21, bart wrote:

On 06/11/2025 07:51, David Brown wrote:

On 05/11/2025 16:15, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/11/2025 23:04, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/11/2025 18:32, Scott Lurndal wrote:

.? (And I have never
seen a Cortex-M device with programmable windows or addresses
- indeed,
I believe the Cortex-M core documentation specifies some
memory ranges
explicitly.

I have used Cortex-M devices with programmable windows
in the physical address space.

OK.? I have not, but I haven't used the newer Cortex-M cores as
yet, so it could well be a new feature.

It is not necessarily a feature of the M7 core itself, but rather
the glue logic around it - particularly the logic that interfaces
to the "system bus" to which the M7 core is interfaced.?? That
logic is under the control of the SoC designer and can easily have
external registers that are programmed to specify how to route
accesses from the M7, including to large regions of DRAM;
consider a maintenance processor on a 64-bit server that needs
access to the server DRAM space for RAS purposes.

Fair enough, now I see what you are getting at.? Yes, once you are
outside the Cortex-M core and key ARM-supplied components (like
the interrupt controller), you as a SoC designer are free to do
what you like.? And if you have a 32-bit processor that needs
access to a 64-bit address space, you are going to have to do some
kind of windowing or segmenting.

In the SoC's I have used where 64-bit Cortex-A processors are
combined with a Cortex-M core for security purposes, booting, or
for better real- time control of peripherals, the Cortex-M device
does not have direct access to the 64-bit memory space.? It has
access to the peripherals, some dedicated memory, and a
message-passing interface with the Cortex-A cores.

But in your work, you probably see more variety and more
possibilities for these things - I only get to use the chips
someone else has made!

I think you were right, if this 'M7' chip doesn't directly have
registers, instructions or infrastructure to access the more
complex memory system.

Unless you are modifying M7 itself, then that 'glue' logic could be applied to anything (eg. I've built a Z80 system with 256KB RAM),
and it is that composite system that a language + compiler can
target.

Then it would appear to the use of the language that the target
machine had those extended features. But if they were to look at
the generated code, they might see it was accessing external
registers or whatever.

So it's cheating.

You were fine up until the last sentence here. What do you mean by "cheating" ? Whose rules is it breaking? The system Scott was
describing (assuming I understood him correctly) let the 32-bit core
access blocks of the 64-bit address space. You can choose which part
of the address space is accessible at any given time (presumably by accessing segment or window registers like any other memory-mapped peripheral registers). But you can't call it "cheating" unless you
have defined some set of rules for what is "allowed" and what is not
allowed, and everyone else has agreed to play by those rules.

Doing this sort of tricks with Cortex-M7 is asking for trouble. Its
data cache is unaware of tricks you play with windows, so programmer
have to flush/invalidate cache lines manually. Sooner or later
programmer will make mistake. A mistake of the sort that is very hard to
debug.
I'd say, if you (SOC designer) absolutely have to play these games, just
use Cortex-M4.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Thu Nov 6 15:56:12 2025

On 06/11/2025 14:17, Michael S wrote:

On Thu, 6 Nov 2025 13:56:17 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 06/11/2025 12:21, bart wrote:

On 06/11/2025 07:51, David Brown wrote:

On 05/11/2025 16:15, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/11/2025 23:04, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/11/2025 18:32, Scott Lurndal wrote:

.� (And I have never
seen a Cortex-M device with programmable windows or addresses
- indeed,
I believe the Cortex-M core documentation specifies some
memory ranges
explicitly.

I have used Cortex-M devices with programmable windows
in the physical address space.

OK.� I have not, but I haven't used the newer Cortex-M cores as
yet, so it could well be a new feature.

It is not necessarily a feature of the M7 core itself, but rather
the glue logic around it - particularly the logic that interfaces
to the "system bus" to which the M7 core is interfaced.�� That
logic is under the control of the SoC designer and can easily have
external registers that are programmed to specify how to route
accesses from the M7, including to large regions of DRAM;
consider a maintenance processor on a 64-bit server that needs
access to the server DRAM space for RAS purposes.

Fair enough, now I see what you are getting at.� Yes, once you are
outside the Cortex-M core and key ARM-supplied components (like
the interrupt controller), you as a SoC designer are free to do
what you like.� And if you have a 32-bit processor that needs
access to a 64-bit address space, you are going to have to do some
kind of windowing or segmenting.

In the SoC's I have used where 64-bit Cortex-A processors are
combined with a Cortex-M core for security purposes, booting, or
for better real- time control of peripherals, the Cortex-M device
does not have direct access to the 64-bit memory space.� It has
access to the peripherals, some dedicated memory, and a
message-passing interface with the Cortex-A cores.

But in your work, you probably see more variety and more
possibilities for these things - I only get to use the chips
someone else has made!

I think you were right, if this 'M7' chip doesn't directly have
registers, instructions or infrastructure to access the more
complex memory system.

Unless you are modifying M7 itself, then that 'glue' logic could be
applied to anything (eg. I've built a Z80 system with 256KB RAM),
and it is that composite system that a language + compiler can
target.

Then it would appear to the use of the language that the target
machine had those extended features. But if they were to look at
the generated code, they might see it was accessing external
registers or whatever.

So it's cheating.

You were fine up until the last sentence here. What do you mean by
"cheating" ? Whose rules is it breaking? The system Scott was
describing (assuming I understood him correctly) let the 32-bit core
access blocks of the 64-bit address space. You can choose which part
of the address space is accessible at any given time (presumably by
accessing segment or window registers like any other memory-mapped
peripheral registers). But you can't call it "cheating" unless you
have defined some set of rules for what is "allowed" and what is not
allowed, and everyone else has agreed to play by those rules.

Doing this sort of tricks with Cortex-M7 is asking for trouble.

Scott is talking about specialised use of an M7 within a Cortex-A SoC
that is itself rather specialised (I'm guessing it is embedded within a massive switch chip). The folks that program it are going to be within
the same company as the folks that made the SoC, and it's reasonable to
assume they know what they are doing. I agree that there can be gotchas
here that could cause trouble for the average microcontroller programmer.

Its
data cache is unaware of tricks you play with windows, so programmer
have to flush/invalidate cache lines manually.

My guess is that the address range here would be marked uncacheable.

Sooner or later
programmer will make mistake. A mistake of the sort that is very hard to debug.

I certainly agree that cache issues can be a challenge to debug, and if
you don't understand what's going on, you can get very strange effects.
Caches are something that you can often ignore when doing "normal"
things, but if you are doing something unusual, you have to get the code
right by design. You can't test your way to correct code, or use trial-and-error here!

I'd say, if you (SOC designer) absolutely have to play these games, just
use Cortex-M4.

It's easy enough to make the memory area in question uncacheable, and
then there is no problem.

(I think it is likely that for the kind of uses such a device would
have, such as running memory tests before starting the main system,
caching is not helpful.)

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Thu Nov 6 15:00:30 2025

Michael S <already5chosen@yahoo.com> writes:

On Thu, 6 Nov 2025 13:56:17 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 06/11/2025 12:21, bart wrote:

On 06/11/2025 07:51, David Brown wrote: =20

On 05/11/2025 16:15, Scott Lurndal wrote: =20

David Brown <david.brown@hesbynett.no> writes: =20

On 04/11/2025 23:04, Scott Lurndal wrote: =20

David Brown <david.brown@hesbynett.no> writes: =20

On 04/11/2025 18:32, Scott Lurndal wrote: =20

=20

.=A0 (And I have never
seen a Cortex-M device with programmable windows or addresses
- indeed,
I believe the Cortex-M core documentation specifies some
memory ranges
explicitly. =20

I have used Cortex-M devices with programmable windows
in the physical address space. =20

OK.=A0 I have not, but I haven't used the newer Cortex-M cores as
yet, so it could well be a new feature. =20

It is not necessarily a feature of the M7 core itself, but rather
the glue logic around it - particularly the logic that interfaces
to the "system bus" to which the M7 core is interfaced.=A0=A0 That
logic is under the control of the SoC designer and can easily have
external registers that are programmed to specify how to route
accesses from the M7, including to large regions of DRAM;
consider a maintenance processor on a 64-bit server that needs
access to the server DRAM space for RAS purposes.
=20

Fair enough, now I see what you are getting at.=A0 Yes, once you are=20 >> >> outside the Cortex-M core and key ARM-supplied components (like
the interrupt controller), you as a SoC designer are free to do
what you like.=A0 And if you have a 32-bit processor that needs
access to a 64-bit address space, you are going to have to do some
kind of windowing or segmenting.

In the SoC's I have used where 64-bit Cortex-A processors are
combined with a Cortex-M core for security purposes, booting, or
for better real- time control of peripherals, the Cortex-M device
does not have direct access to the 64-bit memory space.=A0 It has
access to the peripherals, some dedicated memory, and a
message-passing interface with the Cortex-A cores.

But in your work, you probably see more variety and more
possibilities for these things - I only get to use the chips
someone else has made!=20

=20
I think you were right, if this 'M7' chip doesn't directly have=20
registers, instructions or infrastructure to access the more
complex memory system.
=20
Unless you are modifying M7 itself, then that 'glue' logic could be=20
applied to anything (eg. I've built a Z80 system with 256KB RAM),
and it is that composite system that a language + compiler can
target.
=20
Then it would appear to the use of the language that the target
machine had those extended features. But if they were to look at
the generated code, they might see it was accessing external
registers or whatever.
=20
So it's cheating. =20

=20
You were fine up until the last sentence here. What do you mean by=20
"cheating" ? Whose rules is it breaking? The system Scott was=20
describing (assuming I understood him correctly) let the 32-bit core=20
access blocks of the 64-bit address space. You can choose which part
of the address space is accessible at any given time (presumably by=20
accessing segment or window registers like any other memory-mapped=20
peripheral registers). But you can't call it "cheating" unless you
have defined some set of rules for what is "allowed" and what is not
allowed, and everyone else has agreed to play by those rules.
=20
=20

Doing this sort of tricks with Cortex-M7 is asking for trouble. Its
data cache is unaware of tricks you play with windows, so programmer
have to flush/invalidate cache lines manually.

That is an inaccurate statement. The cache semantics are defined by
the Cortex-M7 address map (see B.31 in DDI0403E) and use the appropriate
AXI bus operations as required by the region and memory type
registers.

There is no intention or requirement for accesses to SoC DRAM by the M7 to be cache-coherent with respect to the application cores on the SoC. In
any case, any region of the M7 address space can be specified as WT
and non cached.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Fri Nov 7 15:50:53 2025

In article <10eda8d$3pd45$1@dont-email.me>,
Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/4/25 08:20, Scott Lurndal wrote:

Kaz Kylheku <643-408-1753@kylheku.com> writes:

On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/3/25 13:24, Lynn McGuire wrote:

When I saw this subject line, I thought it was some necroposting to
threads from 1990.

Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.

I was thinking, are there any segmented architectures today? Most
disguise segmentation as a flat address space (e.g. IBM System/370 et.seq.)

x86_64 is still nominally segmented; what "code segment" the
processor is running in matters, even in long mode. But most of
the segment data is ignored by hardware (e.g., base and limits)
in 64-bit mode.

Of course, it retains a notion of segmentation for a) 16- and
32-bit code compatibility, and b) startup, where the processor
(still!!) comes out of reset in 16-bit real mode.

Intel had a proposal to do away with 16-bit mode and anything
other than long mode for 64-bit, but it seems to have died. So
it seems like we'll be stuck with x86 segmentation --- at least
for compatibility purposes --- for a while longer still.

- Dan C.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Fri Nov 7 16:08:54 2025

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10eda8d$3pd45$1@dont-email.me>,
Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/4/25 08:20, Scott Lurndal wrote:

Kaz Kylheku <643-408-1753@kylheku.com> writes:

On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/3/25 13:24, Lynn McGuire wrote:

When I saw this subject line, I thought it was some necroposting to
threads from 1990.

Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.

I was thinking, are there any segmented architectures today? Most
disguise segmentation as a flat address space (e.g. IBM System/370 et.seq.)

x86_64 is still nominally segmented; what "code segment" the
processor is running in matters, even in long mode. But most of
the segment data is ignored by hardware (e.g., base and limits)
in 64-bit mode.

Minor correction, an update to AMD64 was done back in
the oughts to support some segment limit registers for 64-bit XEN
(and probably for vmware as well).

See the LMSLE bit in the EFER register for more details.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Paul S Person@3:633/10 to All on Fri Nov 7 08:22:11 2025

On Fri, 7 Nov 2025 15:50:53 -0000 (UTC), cross@spitfire.i.gajendra.net
(Dan Cross) wrote:

In article <10eda8d$3pd45$1@dont-email.me>,
Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/4/25 08:20, Scott Lurndal wrote:

Kaz Kylheku <643-408-1753@kylheku.com> writes:

On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/3/25 13:24, Lynn McGuire wrote:

When I saw this subject line, I thought it was some necroposting to
threads from 1990.

Someone still cared about segmented x86 shit in 2010 (even if 32

bit)?

There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.

I was thinking, are there any segmented architectures today? Most
disguise segmentation as a flat address space (e.g. IBM System/370

et.seq.)

x86_64 is still nominally segmented; what "code segment" the
processor is running in matters, even in long mode. But most of
the segment data is ignored by hardware (e.g., base and limits)
in 64-bit mode.

Of course, it retains a notion of segmentation for a) 16- and
32-bit code compatibility, and b) startup, where the processor
(still!!) comes out of reset in 16-bit real mode.

Intel had a proposal to do away with 16-bit mode and anything
other than long mode for 64-bit, but it seems to have died. So
it seems like we'll be stuck with x86 segmentation --- at least
for compatibility purposes --- for a while longer still.

This is all very interesting as a summary of where-we-are. Thanks.

Didn't Intel, at one time, plan to replace all xxx8x processors with
one of the new! shiny! RISC processor?

Only to be defeated when it was pointed out that a whole lot of
software would have to run on it. Software written for their xxx8x
processors, segmentation and all.
--
"Here lies the Tuscan poet Aretino,
Who evil spoke of everyone but God,
Giving as his excuse, 'I never knew him.'"

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Fri Nov 7 16:46:56 2025

In article <10edcbg$lrh1$1@dont-email.me>,
geodandw <geodandw@gmail.com> wrote:

On 11/4/25 12:12, Richard Heathfield wrote:

On 04/11/2025 15:20, Scott Lurndal wrote:

Kaz Kylheku <643-408-1753@kylheku.com> writes:

On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/3/25 13:24, Lynn McGuire wrote:

When I saw this subject line, I thought it was some necroposting to
threads from 1990.

Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.

I can still hear them down the hall.

ST!
.......................................................Amiga!
ST!
.......................................................Amiga!

The 68000 was a very nice processor for its time. It's too bad IBM
didn't use it in the PC.

They wanted to. IBM had a close relationship with Motorola, and
they even had engineering samples in Westchester. The problem
was that 68k was a skunkworks project inside of Moto, which was
pushing the 6809 as the Next Big Thing. So when IBM was talking
to Moto sales about using 68k for the PC, Moto was pushing them
(not so gently) towards the 6809 and telling them 68k was just a
research project with no future.

IBM was smart enough to know that the 6809 was going to be a
non-starter (a firmly 8-bit micro when 16-bit CPUs were becoming
mainstream), and the 8088 met their specs for the 5150, so they
went with Intel instead. By the time it was clear that the 68k
was going to be Moto's flagship CPU going forward, it was too
late for inclusion in the PC.

And here we are.

- Dan C.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Fri Nov 7 16:54:44 2025

In article <qQoPQ.1134549$p8E9.400952@fx18.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10eda8d$3pd45$1@dont-email.me>,
Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/4/25 08:20, Scott Lurndal wrote:

Kaz Kylheku <643-408-1753@kylheku.com> writes:

On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/3/25 13:24, Lynn McGuire wrote:

When I saw this subject line, I thought it was some necroposting to
threads from 1990.

Someone still cared about segmented x86 shit in 2010 (even if 32 bit)? >>>>

There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.

I was thinking, are there any segmented architectures today? Most >>>disguise segmentation as a flat address space (e.g. IBM System/370 et.seq.) >>

x86_64 is still nominally segmented; what "code segment" the
processor is running in matters, even in long mode. But most of
the segment data is ignored by hardware (e.g., base and limits)
in 64-bit mode.

Minor correction, an update to AMD64 was done back in
the oughts to support some segment limit registers for 64-bit XEN
(and probably for vmware as well).

See the LMSLE bit in the EFER register for more details.

Interesting. AMD-only, not Intel.

This is why we can't have nice things.

- Dan C.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Fri Nov 7 17:22:51 2025

In article <7r6sgktd4p0ae1e3p97hc7h89nloaldbrj@4ax.com>,
Paul S Person <psperson@old.netcom.invalid> wrote:

On Fri, 7 Nov 2025 15:50:53 -0000 (UTC), cross@spitfire.i.gajendra.net
(Dan Cross) wrote:

In article <10eda8d$3pd45$1@dont-email.me>,
Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/4/25 08:20, Scott Lurndal wrote:

Kaz Kylheku <643-408-1753@kylheku.com> writes:

On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/3/25 13:24, Lynn McGuire wrote:

When I saw this subject line, I thought it was some necroposting to
threads from 1990.

Someone still cared about segmented x86 shit in 2010 (even if 32 bit)? >>>>

There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.

I was thinking, are there any segmented architectures today? Most >>>disguise segmentation as a flat address space (e.g. IBM System/370 et.seq.) >>

x86_64 is still nominally segmented; what "code segment" the
processor is running in matters, even in long mode. But most of
the segment data is ignored by hardware (e.g., base and limits)
in 64-bit mode.

Of course, it retains a notion of segmentation for a) 16- and
32-bit code compatibility, and b) startup, where the processor
(still!!) comes out of reset in 16-bit real mode.

Intel had a proposal to do away with 16-bit mode and anything
other than long mode for 64-bit, but it seems to have died. So
it seems like we'll be stuck with x86 segmentation --- at least
for compatibility purposes --- for a while longer still.

This is all very interesting as a summary of where-we-are. Thanks.

Didn't Intel, at one time, plan to replace all xxx8x processors with
one of the new! shiny! RISC processor?

Well, Itanium was going to sweep all that came before it into
the dustbin of history.

Only to be defeated when it was pointed out that a whole lot of
software would have to run on it. Software written for their xxx8x >processors, segmentation and all.

Nah, that wasn't that big of an issue. By then, segmentation
was already mostly legacy and systems that really relied on it
had been designed in an era of slow CPUs that could be emulated
in software if you really needed them for installed base
compatibility.

The heyday of x86 segmentation was really over by 1985. The
80386 was intended to be a processor for the Unix workstation
market, and supported a paged, flat 32-bit address space. They
shoehorned that into the segmented model by a) increasing the
size of the segment limit and b) adding the "granularity" bit in
segment descriptors that allowed segments to be defined in units
of 4KiB, rather than single bytes. The upshot was that a
segment could cover the full 32-bit virtual address space; so
the intended use case was that OSes would set up a couple 4GiB
segments at boot, point the segmentation registers at those, and
then work in terms of the paged virtual address space after
that; all of the nasty pre-386 segment stuff would be relegated
to a relatively small part of the system. So most software that
really used the segmentation stuff had been written for the 286
or earlier, when CPUs were pretty slow and pokey, making
emulation a reasonable path for backwards compatibility.

The bigger problem for Itanium was that, in order to really
perform well, they needed super-smart compilers that could do
the instruction scheduling needed to take advantage of its VLIW
architecture. Those never came, and so the realized performance
just wasn't there relative to the promises Intel had made for
the architecture. Meanwhile, a bunch of ex-DEC people went to
AMD and did the AMD64 extensions for x86, which a) performed
pretty decently (at a much lower price-point than Itanium), and
b) was directly backwards compatible with 32-bit x86. Within,
what, a year or so, Intel had no choice but to copy the design
with their own offering, and the market ran with it.

This was all in the last 90s/early 2000s, but by 2003 or there
abouts it was clear that Itanium was never going to reach its
promises.

Speculating about alternate historical timelines is always fun.
Had IBM chosen the 68k two decades before Itanium, I suspect the
world would be very different: Moto didn't try to push that
design beyond the 68060, which was competitive with the Pentium
at roughly the same time, but didn't have a pipelined FPU;
perhaps it would if Moto had had the kind of capital resources
Intel could bring to bear for Pentium and beyond. I suspect,
however, that Moto would have dumped the 68k architecture and
we'd all be using some kind of RISC ISA directly.

One final note about Itanium: Intel had tried the VLIW thing
before with the i860, and ran into the same problem: the
compilers of that era just weren't there to make it competitive
for general-purpose compute. You'd think they'd have learned
that lesson for Itanium, and either done the compiler work
themselves, or funded it externally, _before_ betting so big on
it.

- Dan C.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Fri Nov 7 17:43:25 2025

Paul S Person <psperson@old.netcom.invalid> writes:

On Fri, 7 Nov 2025 15:50:53 -0000 (UTC), cross@spitfire.i.gajendra.net
(Dan Cross) wrote:

Intel had a proposal to do away with 16-bit mode and anything
other than long mode for 64-bit, but it seems to have died. So
it seems like we'll be stuck with x86 segmentation --- at least
for compatibility purposes --- for a while longer still.

This is all very interesting as a summary of where-we-are. Thanks.

Dan was referring to https://www.intel.com/content/www/us/en/developer/articles/technical/envisioning-future-simplified-architecture.html

Didn't Intel, at one time, plan to replace all xxx8x processors with
one of the new! shiny! RISC processor?

The EPIC[*] processor family (Monterey, Merced) known now as Itanium
was intended to be Intel's replacement for the x86 server grade
processors, replacing the proposed P7[**] design. It was an epic failure, primarily due to cost, compiler complexity and lack of competitive performance.

[*] Explicitly Parallel Instruction Computing.

[**] Which was a RISC-like processor. Unfortunately, there's not much
information about the original P7 design on the internet, and I
wasn't allowed to keep my P7 orange books.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Fri Nov 7 19:40:56 2025

On Fri, 07 Nov 2025 08:22:11 -0800, Paul S Person wrote:

Didn't Intel, at one time, plan to replace all xxx8x processors with
one of the new! shiny! RISC processor?

They tried twice, and failed both times. The first time was the i860 <https://www.youtube.com/watch?v=WTkFGZqVCM8>.

The second, better-known failure was in conjunction with HP <https://www.youtube.com/watch?v=3oxrybkd7Mo>.

Only to be defeated when it was pointed out that a whole lot of
software would have to run on it. Software written for their xxx8x processors, segmentation and all.

In terms of software compatibility, open-source platforms like Linux
have shown that that does not need to be a barrier to innovation at
all.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Sat Nov 8 00:00:06 2025

In article <20251106151700.00006730@yahoo.com>,
Michael S <already5chosen@yahoo.com> wrote:

On Thu, 6 Nov 2025 13:56:17 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 06/11/2025 12:21, bart wrote:

On 06/11/2025 07:51, David Brown wrote:

On 05/11/2025 16:15, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/11/2025 23:04, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/11/2025 18:32, Scott Lurndal wrote:

.? (And I have never
seen a Cortex-M device with programmable windows or addresses
- indeed,
I believe the Cortex-M core documentation specifies some
memory ranges
explicitly.

I have used Cortex-M devices with programmable windows
in the physical address space.

OK.? I have not, but I haven't used the newer Cortex-M cores as
yet, so it could well be a new feature.

It is not necessarily a feature of the M7 core itself, but rather
the glue logic around it - particularly the logic that interfaces
to the "system bus" to which the M7 core is interfaced.?? That
logic is under the control of the SoC designer and can easily have
external registers that are programmed to specify how to route
accesses from the M7, including to large regions of DRAM;
consider a maintenance processor on a 64-bit server that needs
access to the server DRAM space for RAS purposes.

Fair enough, now I see what you are getting at.? Yes, once you are
outside the Cortex-M core and key ARM-supplied components (like
the interrupt controller), you as a SoC designer are free to do
what you like.? And if you have a 32-bit processor that needs
access to a 64-bit address space, you are going to have to do some
kind of windowing or segmenting.

In the SoC's I have used where 64-bit Cortex-A processors are
combined with a Cortex-M core for security purposes, booting, or
for better real- time control of peripherals, the Cortex-M device
does not have direct access to the 64-bit memory space.? It has
access to the peripherals, some dedicated memory, and a
message-passing interface with the Cortex-A cores.

But in your work, you probably see more variety and more
possibilities for these things - I only get to use the chips
someone else has made!

I think you were right, if this 'M7' chip doesn't directly have
registers, instructions or infrastructure to access the more
complex memory system.

Unless you are modifying M7 itself, then that 'glue' logic could be
applied to anything (eg. I've built a Z80 system with 256KB RAM),
and it is that composite system that a language + compiler can
target.

Then it would appear to the use of the language that the target
machine had those extended features. But if they were to look at
the generated code, they might see it was accessing external
registers or whatever.

So it's cheating.

You were fine up until the last sentence here. What do you mean by
"cheating" ? Whose rules is it breaking? The system Scott was
describing (assuming I understood him correctly) let the 32-bit core
access blocks of the 64-bit address space. You can choose which part
of the address space is accessible at any given time (presumably by
accessing segment or window registers like any other memory-mapped
peripheral registers). But you can't call it "cheating" unless you
have defined some set of rules for what is "allowed" and what is not
allowed, and everyone else has agreed to play by those rules.

Doing this sort of tricks with Cortex-M7 is asking for trouble. Its
data cache is unaware of tricks you play with windows, so programmer
have to flush/invalidate cache lines manually. Sooner or later
programmer will make mistake.

As Scott already said, he's not in the same cache coherency
domain as the A-profile cores, so it doesn't really matter and
the memory map defines the cache attributes of these aperture
regions appropriately. However, I want to point out that on
_any_ relaxed memory architecture CPU, the programmer already
has to be aware of these issues and deal with them accordingly.
E.g, consider implementing a context switch or mutex or
someth8ing.

A mistake of the sort that is very hard to debug.

Welcome to 2025.

I'd say, if you (SOC designer) absolutely have to play these games, just
use Cortex-M4.

Sometimes you really do need an M7 class part.

- Dan C.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Peter Flass@3:633/10 to All on Sat Nov 8 08:45:04 2025

On 11/7/25 08:50, Dan Cross wrote:

In article <10eda8d$3pd45$1@dont-email.me>,
Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/4/25 08:20, Scott Lurndal wrote:

Kaz Kylheku <643-408-1753@kylheku.com> writes:

On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/3/25 13:24, Lynn McGuire wrote:

When I saw this subject line, I thought it was some necroposting to
threads from 1990.

Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?

There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.

I was thinking, are there any segmented architectures today? Most
disguise segmentation as a flat address space (e.g. IBM System/370 et.seq.)

x86_64 is still nominally segmented; what "code segment" the
processor is running in matters, even in long mode. But most of
the segment data is ignored by hardware (e.g., base and limits)
in 64-bit mode.

Of course, it retains a notion of segmentation for a) 16- and
32-bit code compatibility, and b) startup, where the processor
(still!!) comes out of reset in 16-bit real mode.

Intel had a proposal to do away with 16-bit mode and anything
other than long mode for 64-bit, but it seems to have died. So
it seems like we'll be stuck with x86 segmentation --- at least
for compatibility purposes --- for a while longer still.

- Dan C.

Probably at least until the 128-bit systems arrive ;-)

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Peter Flass@3:633/10 to All on Sat Nov 8 08:47:11 2025

On 11/7/25 09:46, Dan Cross wrote:

In article <10edcbg$lrh1$1@dont-email.me>,
geodandw <geodandw@gmail.com> wrote:

On 11/4/25 12:12, Richard Heathfield wrote:

On 04/11/2025 15:20, Scott Lurndal wrote:

Kaz Kylheku <643-408-1753@kylheku.com> writes:

On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/3/25 13:24, Lynn McGuire wrote:

When I saw this subject line, I thought it was some necroposting to
threads from 1990.

Someone still cared about segmented x86 shit in 2010 (even if 32 bit)? >>>>

There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.

I can still hear them down the hall.

ST!
.......................................................Amiga!
ST!
.......................................................Amiga!

The 68000 was a very nice processor for its time. It's too bad IBM
didn't use it in the PC.

They wanted to. IBM had a close relationship with Motorola, and
they even had engineering samples in Westchester. The problem
was that 68k was a skunkworks project inside of Moto, which was
pushing the 6809 as the Next Big Thing. So when IBM was talking
to Moto sales about using 68k for the PC, Moto was pushing them
(not so gently) towards the 6809 and telling them 68k was just a
research project with no future.

IBM was smart enough to know that the 6809 was going to be a
non-starter (a firmly 8-bit micro when 16-bit CPUs were becoming
mainstream), and the 8088 met their specs for the 5150, so they
went with Intel instead. By the time it was clear that the 68k
was going to be Moto's flagship CPU going forward, it was too
late for inclusion in the PC.

And here we are.

- Dan C.

I think they used the 680x0 in one of their small computers. Maybe the "Laboratory Computer"?

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From John Levine@3:633/10 to All on Sat Nov 8 21:17:05 2025

According to Peter Flass <Peter@Iron-Spring.com>:

I think they used the 680x0 in one of their small computers. Maybe the >"Laboratory Computer"?

That was IBM Instruments, a small company that IBM bought after they'd
already developed the product and just rebadged it.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Sun Nov 9 11:15:01 2025

On Fri, 7 Nov 2025 17:22:51 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) wrote:

the architecture. Meanwhile, a bunch of ex-DEC people went to
AMD and did the AMD64 extensions for x86, which a) performed

Do you have a proof that it was done by Ex-DEC people?
My impression is that Ex-DEC people, esp. Jim Keller, were very
important as micro-architects of K7 and K8, but I don't remember ever
reading that they played major role in the stage of architectural
definitions of AMD64.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Sun Nov 9 11:46:00 2025

On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) wrote:

I'd say, if you (SOC designer) absolutely have to play these games,
just use Cortex-M4.

Sometimes you really do need an M7 class part.

- Dan C.

Somehow I suspect that [at the same clock frequency] M4 could access
uncached memory faster that M7. May be, even significantly faster.

Unfortunately, info about M7 instructions timing does not appear to be
public.

If one needs something like DP floating or when uncached accesses are
only small part of the job and the rest of the load is compute
-intensive then I can see how M7 could look attractive vs M4.
But personally in such case I'd start to look for non-Cortex-M solution.
May be R4, although I don't like it. May be A5. In huge SoCs of sort
Scott is working on - A34 or even 510. Plus, another M4 to handle more
typical MCU tasks.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Sun Nov 9 12:29:32 2025

On 09/11/2025 10:46, Michael S wrote:

On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) wrote:

I'd say, if you (SOC designer) absolutely have to play these games,
just use Cortex-M4.

Sometimes you really do need an M7 class part.

- Dan C.

Somehow I suspect that [at the same clock frequency] M4 could access
uncached memory faster that M7. May be, even significantly faster.

I suspect you would be wrong. The M7 can do more per clock than the M4,
has wider buses, and has support for direct data and instruction
memories with their own dedicated buses. I can appreciate the gut
feeling that because there is the option of caching accesses, that extra functionality may slow down accesses when the cache is not used, but I
don't believe that happens on the M7. And everything other than the
accesses themselves (the loads, stores, address increments, looping,
etc.) can be quite a lot faster at the same clock speed.

But as you say, public data on timings is limited - and even when the
data on the core is available, timings can be very dependent on details
of the implementation and connections outside the core.

We could always appeal to authority - Scott's company knows what they
are doing, have access to far more detailed information and technical assistance from ARM than we do, and have picked an M7 rather than an M4.
But speculation is more fun :-)

Unfortunately, info about M7 instructions timing does not appear to be public.

If one needs something like DP floating or when uncached accesses are
only small part of the job and the rest of the load is compute
-intensive then I can see how M7 could look attractive vs M4.
But personally in such case I'd start to look for non-Cortex-M solution.
May be R4, although I don't like it. May be A5. In huge SoCs of sort
Scott is working on - A34 or even 510. Plus, another M4 to handle more typical MCU tasks.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Sun Nov 9 14:40:31 2025

On Sun, 9 Nov 2025 12:29:32 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 09/11/2025 10:46, Michael S wrote:

On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) wrote:

I'd say, if you (SOC designer) absolutely have to play these
games, just use Cortex-M4.

Sometimes you really do need an M7 class part.

- Dan C.

Somehow I suspect that [at the same clock frequency] M4 could access uncached memory faster that M7. May be, even significantly faster.

I suspect you would be wrong. The M7 can do more per clock than the
M4, has wider buses, and has support for direct data and instruction memories with their own dedicated buses.

If I am not mistaken, with exception of caches, M4 and M7 have
3 identical "fast" 32-bit busses - I+D+AHB. Plus some slower auxiliary
stuff.

I can appreciate the gut
feeling that because there is the option of caching accesses, that
extra functionality may slow down accesses when the cache is not
used, but I don't believe that happens on the M7. And everything
other than the accesses themselves (the loads, stores, address
increments, looping, etc.) can be quite a lot faster at the same
clock speed.

Except that every branch mispredict is more than twice slower. I'd
guess that the latency of the cache/TCM *hit* is also 1 clock slower
that latency of internal SRAM access on M4, but absence of docs
prevents me from proving it.
As to cache miss, I am pretty sure that it completely stalls M7
pipeline. In case of M4, I think that after external Load pipeline makes
one more step before it stalls. And, of course, the stall itself is less expensive.
Once again, I can't prove it because of absence of docs.

But as you say, public data on timings is limited -

In case of M7, public data is not "limited", it is absent.
AFAIK, it's not the case for all other Cortex-M cores. Back when M7 was
new, Arm claimed that the data is not made available because the core
is more complicated that the rest of Cortex-M line. As silly as it
sounds they could continue to claim it with sort of straight face for as
long as other Cortex-M cores were, indeed, simpler. Which is not the
case since 2022, because Cortex M85 is no less complicated than M7 and
arguably even a little more so. Despite that, there exist M85 Software Optimization Guide that contains instruction tables with latency and
throughput data. Yes, it has few omissions, but it proves that there is
nothing impossible in documenting cores of this level of complexity,
even if you as lazy as Cortex M documentation team appears to be
(relatively, for example, to Cortex-A/Neoverse side of the company).

and even when the
data on the core is available, timings can be very dependent on
details of the implementation and connections outside the core.

We could always appeal to authority - Scott's company knows what they
are doing, have access to far more detailed information and technical assistance from ARM than we do, and have picked an M7 rather than an
M4. But speculation is more fun :-)

Unfortunately, info about M7 instructions timing does not appear to
be public.

If one needs something like DP floating or when uncached accesses
are only small part of the job and the rest of the load is compute -intensive then I can see how M7 could look attractive vs M4.
But personally in such case I'd start to look for non-Cortex-M
solution. May be R4, although I don't like it. May be A5. In huge
SoCs of sort Scott is working on - A34 or even 510. Plus, another
M4 to handle more typical MCU tasks.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Sun Nov 9 15:54:20 2025

On 09/11/2025 13:40, Michael S wrote:

On Sun, 9 Nov 2025 12:29:32 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 09/11/2025 10:46, Michael S wrote:

On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) wrote:

I'd say, if you (SOC designer) absolutely have to play these
games, just use Cortex-M4.

Sometimes you really do need an M7 class part.

- Dan C.

Somehow I suspect that [at the same clock frequency] M4 could access
uncached memory faster that M7. May be, even significantly faster.

I suspect you would be wrong. The M7 can do more per clock than the
M4, has wider buses, and has support for direct data and instruction
memories with their own dedicated buses.

If I am not mistaken, with exception of caches, M4 and M7 have
3 identical "fast" 32-bit busses - I+D+AHB. Plus some slower auxiliary
stuff.

I believe you are mistaken (which is not something I have seen often).

<https://www.arm.com/-/media/Arm%20Developer%20Community/PDF/Processor%20Datasheets/Arm-Cortex-M7-Processor-Datasheet.pdf>

"""
The interfaces that the processor supports include:
64-bit AXI4 interface
32-bit AHB master interface
32-bit AHB slave interface
64-bit instruction TCM interface
2x32-bit data TCM interfaces
"""

The M7 is dual issue - for some instruction combinations, it runs two instructions per clock. It needs more, faster and wider buses to feed it.

I can appreciate the gut
feeling that because there is the option of caching accesses, that
extra functionality may slow down accesses when the cache is not
used, but I don't believe that happens on the M7. And everything
other than the accesses themselves (the loads, stores, address
increments, looping, etc.) can be quite a lot faster at the same
clock speed.

Except that every branch mispredict is more than twice slower.

Branch mispredict costs are primarily related to pipeline depth on a
processor that does not do any kind of speculative execution. I don't remember the depth of the M4 and M7 off-hand, but the M7 is not twice as
deep as the M4.

I'd
guess that the latency of the cache/TCM *hit* is also 1 clock slower
that latency of internal SRAM access on M4, but absence of docs
prevents me from proving it.

The whole point of the TCM - tightly coupled memories - is that they run
at core speed, and no caches are used. They are as low-latency as can
be achieved with M4 sram, except that now you have independent buses and memory for instruction and data (rather than independent buses to shared memory if you have code and data in ram on most M4 implementations), and
that the buses are twice as wide.

It is possible that there is an extra cycle of latency on accessing main memory, due to the optional path through the cache - I am not sure on
that. But I suspect that the 64-bit wide AXI4 bus, as well as the significantly faster handling of the rest of the code (which does not
need to share the same bus bandwidth as the off-core memory accesses)
more than outweighs that.

As to cache miss, I am pretty sure that it completely stalls M7
pipeline.

Yes. But we are not using the cache in this hypothetical case.

In case of M4, I think that after external Load pipeline makes
one more step before it stalls. And, of course, the stall itself is less expensive.
Once again, I can't prove it because of absence of docs.

But as you say, public data on timings is limited -

In case of M7, public data is not "limited", it is absent.
AFAIK, it's not the case for all other Cortex-M cores. Back when M7 was
new, Arm claimed that the data is not made available because the core
is more complicated that the rest of Cortex-M line. As silly as it
sounds they could continue to claim it with sort of straight face for as
long as other Cortex-M cores were, indeed, simpler. Which is not the
case since 2022, because Cortex M85 is no less complicated than M7 and arguably even a little more so. Despite that, there exist M85 Software Optimization Guide that contains instruction tables with latency and throughput data. Yes, it has few omissions, but it proves that there is nothing impossible in documenting cores of this level of complexity,
even if you as lazy as Cortex M documentation team appears to be
(relatively, for example, to Cortex-A/Neoverse side of the company).

and even when the
data on the core is available, timings can be very dependent on
details of the implementation and connections outside the core.

We could always appeal to authority - Scott's company knows what they
are doing, have access to far more detailed information and technical
assistance from ARM than we do, and have picked an M7 rather than an
M4. But speculation is more fun :-)

Unfortunately, info about M7 instructions timing does not appear to
be public.

If one needs something like DP floating or when uncached accesses
are only small part of the job and the rest of the load is compute
-intensive then I can see how M7 could look attractive vs M4.
But personally in such case I'd start to look for non-Cortex-M
solution. May be R4, although I don't like it. May be A5. In huge
SoCs of sort Scott is working on - A34 or even 510. Plus, another
M4 to handle more typical MCU tasks.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Sun Nov 9 17:50:51 2025

On Sun, 9 Nov 2025 15:54:20 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 09/11/2025 13:40, Michael S wrote:

On Sun, 9 Nov 2025 12:29:32 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 09/11/2025 10:46, Michael S wrote:

On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) wrote:

I'd say, if you (SOC designer) absolutely have to play these
games, just use Cortex-M4.

Sometimes you really do need an M7 class part.

- Dan C.

Somehow I suspect that [at the same clock frequency] M4 could
access uncached memory faster that M7. May be, even significantly
faster.

I suspect you would be wrong. The M7 can do more per clock than
the M4, has wider buses, and has support for direct data and
instruction memories with their own dedicated buses.

If I am not mistaken, with exception of caches, M4 and M7 have
3 identical "fast" 32-bit busses - I+D+AHB. Plus some slower
auxiliary stuff.

I believe you are mistaken (which is not something I have seen often).

<https://www.arm.com/-/media/Arm%20Developer%20Community/PDF/Processor%20Datasheets/Arm-Cortex-M7-Processor-Datasheet.pdf>

"""
The interfaces that the processor supports include:
64-bit AXI4 interface
32-bit AHB master interface
32-bit AHB slave interface
64-bit instruction TCM interface
2x32-bit data TCM interfaces
"""

Yes, I was mistaken. I overlooked AXIM/AXI4.

The M7 is dual issue - for some instruction combinations, it runs two instructions per clock. It needs more, faster and wider buses to
feed it.

I can appreciate the gut
feeling that because there is the option of caching accesses, that
extra functionality may slow down accesses when the cache is not
used, but I don't believe that happens on the M7. And everything
other than the accesses themselves (the loads, stores, address
increments, looping, etc.) can be quite a lot faster at the same
clock speed.

Except that every branch mispredict is more than twice slower.

Branch mispredict costs are primarily related to pipeline depth on a processor that does not do any kind of speculative execution.

Same as on most of those those that do speculative execution. But
that's O.T.

I
don't remember the depth of the M4 and M7 off-hand, but the M7 is not
twice as deep as the M4.

It is twice as deep. 6 vs 3. Which means that typical mispredict penalty differs by factor of 2.5 (5 vs 2).

I'd
guess that the latency of the cache/TCM *hit* is also 1 clock slower
that latency of internal SRAM access on M4, but absence of docs
prevents me from proving it.

The whole point of the TCM - tightly coupled memories - is that they
run at core speed, and no caches are used. They are as low-latency
as can be achieved with M4 sram, except that now you have independent
buses and memory for instruction and data (rather than independent
buses to shared memory if you have code and data in ram on most M4 implementations), and that the buses are twice as wide.

Look at the pipelines.
We have no official pipeline picture for M7, but we can guess with
good certainty that it is very similar to M85, with main difference
being that M85 has 3 LS stages and M7 has only 2.
It is obvious that in the best possible case Load instruction and
dependent Integer Data Processing (DPU) instruction have to be 2 cycles
apart, i.e. minimum load to DPU latency = 3. On M3/M4 minimal latency =
2.

It is possible that there is an extra cycle of latency on accessing
main memory, due to the optional path through the cache - I am not
sure on that. But I suspect that the 64-bit wide AXI4 bus, as well
as the significantly faster handling of the rest of the code (which
does not need to share the same bus bandwidth as the off-core memory accesses) more than outweighs that.

64-bit bus certainly helps a lot for cached accesses. Not sure if it
helps uncached accesses. I'd guess that [for uncached) it does not help
regular integer Load instructions, but sometime helps LDM. It also
likely helps DP FP load instruction when the core configured with DP
FPU.
As to sharing the same bus bandwith, both M4 and M7 have dedicated
I-bus. In typical MCU it is connected to NOR flash and here M7 Icache
helps a lot. In typical big ASIC it is connected to fast SRAM and
Icache makes no difference.

As to cache miss, I am pretty sure that it completely stalls M7
pipeline.

Yes. But we are not using the cache in this hypothetical case.

As far as pipeline goes, uncached access is the same as D-cache miss.
Except that after data finally arrived it does not have to be written
to cache, but for load-to-use latency the latter is irrelevant.

May be, it is one clock better when M7 is configured without data cache,
which is possible and fully supported by ARM, but probably not very
popular among their clients. Or, may be, it's not better.

On soft Nios2-f core which is M7-class core I am most familiar with,
uncached configuration does help, but internals of soft cores are,
well ... more soft.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Sun Nov 9 18:05:15 2025

On 09/11/2025 16:50, Michael S wrote:

On Sun, 9 Nov 2025 15:54:20 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 09/11/2025 13:40, Michael S wrote:

On Sun, 9 Nov 2025 12:29:32 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 09/11/2025 10:46, Michael S wrote:

On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) wrote:

I'd say, if you (SOC designer) absolutely have to play these
games, just use Cortex-M4.

Sometimes you really do need an M7 class part.

- Dan C.

Somehow I suspect that [at the same clock frequency] M4 could
access uncached memory faster that M7. May be, even significantly
faster.

I suspect you would be wrong. The M7 can do more per clock than
the M4, has wider buses, and has support for direct data and
instruction memories with their own dedicated buses.

If I am not mistaken, with exception of caches, M4 and M7 have
3 identical "fast" 32-bit busses - I+D+AHB. Plus some slower
auxiliary stuff.

I believe you are mistaken (which is not something I have seen often).

<https://www.arm.com/-/media/Arm%20Developer%20Community/PDF/Processor%20Datasheets/Arm-Cortex-M7-Processor-Datasheet.pdf>

"""
The interfaces that the processor supports include:
64-bit AXI4 interface
32-bit AHB master interface
32-bit AHB slave interface
64-bit instruction TCM interface
2x32-bit data TCM interfaces
"""

Yes, I was mistaken. I overlooked AXIM/AXI4.

The M7 is dual issue - for some instruction combinations, it runs two
instructions per clock. It needs more, faster and wider buses to
feed it.

I can appreciate the gut
feeling that because there is the option of caching accesses, that
extra functionality may slow down accesses when the cache is not
used, but I don't believe that happens on the M7. And everything
other than the accesses themselves (the loads, stores, address
increments, looping, etc.) can be quite a lot faster at the same
clock speed.

Except that every branch mispredict is more than twice slower.

Branch mispredict costs are primarily related to pipeline depth on a
processor that does not do any kind of speculative execution.

Same as on most of those those that do speculative execution. But
that's O.T.

To some extent, I agree, though speculative execution can make the costs
more complicated (such as by using compute units that would otherwise be
used for useful work, or by speculative memory accesses that use up real bandwidth). But the details are OT - even by the OT standard of this
thread.

I
don't remember the depth of the M4 and M7 off-hand, but the M7 is not
twice as deep as the M4.

It is twice as deep. 6 vs 3. Which means that typical mispredict penalty differs by factor of 2.5 (5 vs 2).

Okay, that's a bigger pipeline difference than I thought. However, it
is also good to remember that the M7 has much more sophisticated branch prediction than the M4, so mispredicts will be fewer on average. Worst
case is going to be worse, however.

I'd
guess that the latency of the cache/TCM *hit* is also 1 clock slower
that latency of internal SRAM access on M4, but absence of docs
prevents me from proving it.

The whole point of the TCM - tightly coupled memories - is that they
run at core speed, and no caches are used. They are as low-latency
as can be achieved with M4 sram, except that now you have independent
buses and memory for instruction and data (rather than independent
buses to shared memory if you have code and data in ram on most M4
implementations), and that the buses are twice as wide.

Look at the pipelines.
We have no official pipeline picture for M7, but we can guess with
good certainty that it is very similar to M85, with main difference
being that M85 has 3 LS stages and M7 has only 2.
It is obvious that in the best possible case Load instruction and
dependent Integer Data Processing (DPU) instruction have to be 2 cycles apart, i.e. minimum load to DPU latency = 3. On M3/M4 minimal latency =
2.

It is possible that there is an extra cycle of latency on accessing
main memory, due to the optional path through the cache - I am not
sure on that. But I suspect that the 64-bit wide AXI4 bus, as well
as the significantly faster handling of the rest of the code (which
does not need to share the same bus bandwidth as the off-core memory
accesses) more than outweighs that.

64-bit bus certainly helps a lot for cached accesses. Not sure if it
helps uncached accesses.

64 bit buses work well with the TCMs. And if you are writing fast code
for the M7, you run the code from the ITCM and have most of your data
(and your stack) in the DTCM.

As for uncached access off core, 64-bit might help for some things, but
you are right that it might not help as much as when using the cache.

I'd guess that [for uncached) it does not help
regular integer Load instructions, but sometime helps LDM. It also
likely helps DP FP load instruction when the core configured with DP
FPU.
As to sharing the same bus bandwith, both M4 and M7 have dedicated
I-bus. In typical MCU it is connected to NOR flash and here M7 Icache
helps a lot. In typical big ASIC it is connected to fast SRAM and
Icache makes no difference.

As to cache miss, I am pretty sure that it completely stalls M7
pipeline.

Yes. But we are not using the cache in this hypothetical case.

As far as pipeline goes, uncached access is the same as D-cache miss.
Except that after data finally arrived it does not have to be written
to cache, but for load-to-use latency the latter is irrelevant.

There are a few other differences. In particular, cache misses
typically mean a whole cache line is read in, whether the access is read
or write. With uncached accesses, that does not happen.

May be, it is one clock better when M7 is configured without data cache, which is possible and fully supported by ARM, but probably not very
popular among their clients. Or, may be, it's not better.

On soft Nios2-f core which is M7-class core I am most familiar with,
uncached configuration does help, but internals of soft cores are,
well ... more soft.

And they are also perhaps better documented :-)

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Sun Nov 9 21:58:10 2025

Michael S <already5chosen@yahoo.com> writes:

On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) wrote:

I'd say, if you (SOC designer) absolutely have to play these games,
just use Cortex-M4.

Sometimes you really do need an M7 class part.

- Dan C.

Somehow I suspect that [at the same clock frequency] M4 could access
uncached memory faster that M7. May be, even significantly faster.

I don't see it. In both cases, they'll be dependent upon the
performance of the system interconnect, plus the m7 has multiple
load units, while the m4 has only one.

Unfortunately, info about M7 instructions timing does not appear to be >public.

As noted above, it's interconnect dependent.

If one needs something like DP floating or when uncached accesses are
only small part of the job and the rest of the load is compute
-intensive then I can see how M7 could look attractive vs M4.

I think you need to provide more data supporting this vis-a-vis
M4 and M7 performance characteristics.

But personally in such case I'd start to look for non-Cortex-M solution.
May be R4, although I don't like it. May be A5. In huge SoCs of sort
Scott is working on - A34 or even 510. Plus, another M4 to handle more >typical MCU tasks.

If you think that the system designers don't take into account the
capabilities of the cores that they select based on the workload
assigned to those cores, you would be incorrect.

There are significant performance advantages to using the m7 vs. the m4, particularly in load/store performance (given the M7 has two load units,
vs only one in the M4) and branch performance.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Mon Nov 10 09:08:58 2025

In article <20251109111501.00000fcd@yahoo.com>,
Michael S <already5chosen@yahoo.com> wrote:

On Fri, 7 Nov 2025 17:22:51 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) wrote:

the architecture. Meanwhile, a bunch of ex-DEC people went to
AMD and did the AMD64 extensions for x86, which a) performed

Do you have a proof that it was done by Ex-DEC people?
My impression is that Ex-DEC people, esp. Jim Keller, were very
important as micro-architects of K7 and K8, but I don't remember ever
reading that they played major role in the stage of architectural
definitions of AMD64.

I'm afraid I do not. I may be incorrect on that part,
or misattributing IP transfer from the cross-licensing
agreement that came out of thr DEC/Intel lawsuit to
sprcific engineers.

- Dan C.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

Who's Online
Recent Visitors
- John F Kennedy
  Thu Nov 20 14:53:19 2025
  from crazyworldbbs.com:2323 via Telnet
- Guest
  Sat Nov 22 17:37:30 2025
  from Meh. Nah via Telnet
- Guest
  Wed Nov 26 06:46:07 2025
  from Gremlintown, Az via Telnet
- Guest
  Thu Nov 27 12:02:51 2025
  from Gremlintown, Az via Raw

System Info

Sysop:	Tetrazocine
Location:	Melbourne, VIC, Australia
Users:	14
Nodes:	8 (0 / 8)
Uptime:	237:52:57
Calls:	184
Files:	21,502
Messages:	82,415

Re: 16:32 far pointers in OpenWatcom C/C++

Who's Online

Recent Visitors

System Info