On 24 Mar, 23:40, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
wrote:
Dann Corbit <dcor...@connx.com> writes:
In article <1e27d5ee-a1b1-45d9-9188-
63ab37398...@d37g2000yqn.googlegroups.com>,
nick_keighley_nos...@hotmail.com says...
On 23 Mar, 20:56, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
In alt.sys.pdp10 Richard Bos <ralt...@xs4all.nl> wrote:
(snip)
That crossposting was, for once, not asinine. It served as a nice
example why, even now, Leenux weenies are not correct when they insist >>>>>> that C has a flat memory model and all pointers are just numbers.
Well, you could also read the C standard to learn that.
but if you say that you get accused of language lawyering.
"Since IBM stopped making 360s no C program ever needs to run on such
a platform"
We have customers who are running their business on harware from the mid >>> 1980s. ˙It may sound ludicrous, but it if solves all of their business
needs, and runs solid 24x365, why should they upgrade?
Because they could run an equivalently computationally powerful
solution with various levels of redundancy and fail-over protection,
with a power budget sensibly measured in mere Watts?
does it have a Coral compiler?
In alt.sys.pdp10 Richard Bos <raltbos@xs4all.nl> wrote:
(snip)
That crossposting was, for once, not asinine. It served as a nice
example why, even now, Leenux weenies are not correct when they insist
that C has a flat memory model and all pointers are just numbers.
Well, you could also read the C standard to learn that.
There are additional complications for C on the PDP-10.
-- glen
Branimir Maksimovic wrote:
On Tue, 23 Mar 2010 06:51:18 -0400
Peter Flass <Peter_Flass@Yahoo.com> wrote:
Jonathan de Boyne Pollard wrote:
Returning to what we were talking about before the silly diversion,Flat memory model.
I should point out that 32-bit applications programming where the
target is extended DOS or 32-bit Win16 (with OpenWatcom's extender)
will also occasionally employ 16:32 far pointers of course.˙ But as
I said before, regular 32-bit OS/2 or Win32 applications
programming generally does not, since those both use the Tiny
memory model,
Problem with standard C and C++ is that they assume flat memory
model.
I'm not a C expert, perhaps you're a denizen of comp.lang.c, but as far
as I know there's nothing in the C standard that assumes anything about pointers, except that they have to be the same size as int, so for 16:32 pointers I guess you'd need 64-bit ints.
As far as implementations are concerned, both Watcom and IBM VA C++
support segmented memory models.˙ These are the ones I'm aware of, there
are probably more.
What happened here?? I just noticed that a lot of these posts are from
2010. Did some news server just barf?
On 3/23/10 14:42, Peter Flass wrote:
Branimir Maksimovic wrote:
On Tue, 23 Mar 2010 06:51:18 -0400
Peter Flass <Peter_Flass@Yahoo.com> wrote:
Jonathan de Boyne Pollard wrote:
Returning to what we were talking about before the silly diversion,Flat memory model.
I should point out that 32-bit applications programming where the
target is extended DOS or 32-bit Win16 (with OpenWatcom's extender)
will also occasionally employ 16:32 far pointers of course.˙ But as
I said before, regular 32-bit OS/2 or Win32 applications
programming generally does not, since those both use the Tiny
memory model,
Problem with standard C and C++ is that they assume flat memory
model.
I'm not a C expert, perhaps you're a denizen of comp.lang.c, but as
far as I know there's nothing in the C standard that assumes anything
about pointers, except that they have to be the same size as int, so
for 16:32 pointers I guess you'd need 64-bit ints.
As far as implementations are concerned, both Watcom and IBM VA C++
support segmented memory models.˙ These are the ones I'm aware of,
there are probably more.
On 11/2/2025 2:20 PM, Peter Flass wrote:
What happened here?? I just noticed that a lot of these posts are from
2010. Did some news server just barf?
On 3/23/10 14:42, Peter Flass wrote:
Branimir Maksimovic wrote:
On Tue, 23 Mar 2010 06:51:18 -0400
Peter Flass <Peter_Flass@Yahoo.com> wrote:
Jonathan de Boyne Pollard wrote:
Returning to what we were talking about before the silly diversion, >>>>>> I should point out that 32-bit applications programming where theFlat memory model.
target is extended DOS or 32-bit Win16 (with OpenWatcom's extender) >>>>>> will also occasionally employ 16:32 far pointers of course.˙ But as >>>>>> I said before, regular 32-bit OS/2 or Win32 applications
programming generally does not, since those both use the Tiny
memory model,
Problem with standard C and C++ is that they assume flat memory
model.
I'm not a C expert, perhaps you're a denizen of comp.lang.c, but as
far as I know there's nothing in the C standard that assumes anything
about pointers, except that they have to be the same size as int, so
for 16:32 pointers I guess you'd need 64-bit ints.
As far as implementations are concerned, both Watcom and IBM VA C++
support segmented memory models.˙ These are the ones I'm aware of,
there are probably more.
I asked Ray Banana of E-S about the openwatcom.* groups and he
resurrected them with all of their very old postings.
Lynn
On 11/3/25 13:24, Lynn McGuire wrote:
On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
On 11/3/25 13:24, Lynn McGuire wrote:
When I saw this subject line, I thought it was some necroposting to
threads from 1990.
Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?
On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
On 11/3/25 13:24, Lynn McGuire wrote:
When I saw this subject line, I thought it was some necroposting to
threads from 1990.
Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?
Amazing ...
On 11/2/2025 2:20 PM, Peter Flass wrote:
What happened here?? I just noticed that a lot of these posts are from
2010. Did some news server just barf?
I asked Ray Banana of E-S about the openwatcom.* groups and he
resurrected them with all of their very old postings.
Kaz Kylheku <643-408-1753@kylheku.com> writes:
On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
On 11/3/25 13:24, Lynn McGuire wrote:
When I saw this subject line, I thought it was some necroposting to
threads from 1990.
Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?
There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.
Kaz Kylheku <643-408-1753@kylheku.com> writes:
On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
On 11/3/25 13:24, Lynn McGuire wrote:
When I saw this subject line, I thought it was some necroposting to
threads from 1990.
Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?
There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.
On 11/4/25 08:20, Scott Lurndal wrote:
Kaz Kylheku <643-408-1753@kylheku.com> writes:
On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
On 11/3/25 13:24, Lynn McGuire wrote:
When I saw this subject line, I thought it was some necroposting to
threads from 1990.
Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?
There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.
I was thinking, are there any segmented architectures today?
Most
disguise segmentation as a flat address space (e.g. IBM System/370 et.seq.)
On 04/11/2025 15:20, Scott Lurndal wrote:
Kaz Kylheku <643-408-1753@kylheku.com> writes:
On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
On 11/3/25 13:24, Lynn McGuire wrote:
When I saw this subject line, I thought it was some necroposting to
threads from 1990.
Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?
There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.
I can still hear them down the hall.
ST!
.......................................................Amiga!
ST!
.......................................................Amiga!
The 68000 was a very nice processor for its time. It's too bad IBM
didn't use it in the PC.
Peter Flass <Peter@Iron-Spring.com> writes:
On 11/4/25 08:20, Scott Lurndal wrote:
Kaz Kylheku <643-408-1753@kylheku.com> writes:
On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
On 11/3/25 13:24, Lynn McGuire wrote:
When I saw this subject line, I thought it was some necroposting to
threads from 1990.
Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?
There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.
I was thinking, are there any segmented architectures today?
Only in emulation (see Unisys Clearpath, for example).
scott@slp53.sl.home (Scott Lurndal) writes:
Peter Flass <Peter@Iron-Spring.com> writes:
On 11/4/25 08:20, Scott Lurndal wrote:
Kaz Kylheku <643-408-1753@kylheku.com> writes:
On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:There are still people on the internet who swear that the 286 is
On 11/3/25 13:24, Lynn McGuire wrote:
When I saw this subject line, I thought it was some necroposting to
threads from 1990.
Someone still cared about segmented x86 shit in 2010 (even if 32 bit)? >>>>
better than sliced bread and refuse to recognize that modern
architectures are superior.
I was thinking, are there any segmented architectures today?
Only in emulation (see Unisys Clearpath, for example).
Although it's worth pointing out that harvard architectures
still exist (e.g. CEVA DSPs) and the low-power ARM
scott@slp53.sl.home (Scott Lurndal) writes:
Peter Flass <Peter@Iron-Spring.com> writes:
On 11/4/25 08:20, Scott Lurndal wrote:
Kaz Kylheku <643-408-1753@kylheku.com> writes:
On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:There are still people on the internet who swear that the 286 is
On 11/3/25 13:24, Lynn McGuire wrote:
When I saw this subject line, I thought it was some necroposting to
threads from 1990.
Someone still cared about segmented x86 shit in 2010 (even if 32 bit)? >>>>
better than sliced bread and refuse to recognize that modern
architectures are superior.
I was thinking, are there any segmented architectures today?
Only in emulation (see Unisys Clearpath, for example).
Although it's worth pointing out that harvard architectures
still exist (e.g. CEVA DSPs)
and the low-power ARM
M-series core 32-bit physical address space is
divided into 28-bit regions some of which may
provide programmable windows into alternate address spaces
in a fashion very similar to segmentation.
On 04/11/2025 18:32, Scott Lurndal wrote:
. (And I have never
seen a Cortex-M device with programmable windows or addresses - indeed,
I believe the Cortex-M core documentation specifies some memory ranges >explicitly.
I was thinking, are there any segmented architectures today?
The 68000 was a very nice processor for its time. It's too bad IBM
didn't use it in the PC.
David Brown <david.brown@hesbynett.no> writes:
On 04/11/2025 18:32, Scott Lurndal wrote:
. (And I have never
seen a Cortex-M device with programmable windows or addresses - indeed,
I believe the Cortex-M core documentation specifies some memory ranges
explicitly.
I have used Cortex-M devices with programmable windows
in the physical address space.
On 04/11/2025 23:04, Scott Lurndal wrote:
David Brown <david.brown@hesbynett.no> writes:
On 04/11/2025 18:32, Scott Lurndal wrote:
. (And I have never
seen a Cortex-M device with programmable windows or addresses - indeed,
I believe the Cortex-M core documentation specifies some memory ranges
explicitly.
I have used Cortex-M devices with programmable windows
in the physical address space.
OK. I have not, but I haven't used the newer Cortex-M cores as yet, so
it could well be a new feature.
David Brown <david.brown@hesbynett.no> writes:
On 04/11/2025 23:04, Scott Lurndal wrote:
David Brown <david.brown@hesbynett.no> writes:
On 04/11/2025 18:32, Scott Lurndal wrote:
. (And I have never
seen a Cortex-M device with programmable windows or addresses - indeed, >>>> I believe the Cortex-M core documentation specifies some memory ranges >>>> explicitly.
I have used Cortex-M devices with programmable windows
in the physical address space.
OK. I have not, but I haven't used the newer Cortex-M cores as yet, so
it could well be a new feature.
It is not necessarily a feature of the M7 core itself, but rather
the glue logic around it - particularly the logic that interfaces
to the "system bus" to which the M7 core is interfaced. That logic
is under the control of the SoC designer and can easily have
external registers that are programmed to specify how to route
accesses from the M7, including to large regions of DRAM;
consider a maintenance processor on a 64-bit server that needs
access to the server DRAM space for RAS purposes.
On 05/11/2025 16:15, Scott Lurndal wrote:
David Brown <david.brown@hesbynett.no> writes:
On 04/11/2025 23:04, Scott Lurndal wrote:
David Brown <david.brown@hesbynett.no> writes:
On 04/11/2025 18:32, Scott Lurndal wrote:
.˙ (And I have never
seen a Cortex-M device with programmable windows or addresses -
indeed,
I believe the Cortex-M core documentation specifies some memory ranges >>>>> explicitly.
I have used Cortex-M devices with programmable windows
in the physical address space.
OK.˙ I have not, but I haven't used the newer Cortex-M cores as yet, so
it could well be a new feature.
It is not necessarily a feature of the M7 core itself, but rather
the glue logic around it - particularly the logic that interfaces
to the "system bus" to which the M7 core is interfaced.˙˙ That logic
is under the control of the SoC designer and can easily have
external registers that are programmed to specify how to route
accesses from the M7, including to large regions of DRAM;
consider a maintenance processor on a 64-bit server that needs
access to the server DRAM space for RAS purposes.
Fair enough, now I see what you are getting at.˙ Yes, once you are
outside the Cortex-M core and key ARM-supplied components (like the interrupt controller), you as a SoC designer are free to do what you
like.˙ And if you have a 32-bit processor that needs access to a 64-bit address space, you are going to have to do some kind of windowing or segmenting.
In the SoC's I have used where 64-bit Cortex-A processors are combined
with a Cortex-M core for security purposes, booting, or for better real- time control of peripherals, the Cortex-M device does not have direct
access to the 64-bit memory space.˙ It has access to the peripherals,
some dedicated memory, and a message-passing interface with the Cortex-A cores.
But in your work, you probably see more variety and more possibilities
for these things - I only get to use the chips someone else has made!
On 06/11/2025 07:51, David Brown wrote:
On 05/11/2025 16:15, Scott Lurndal wrote:
David Brown <david.brown@hesbynett.no> writes:
On 04/11/2025 23:04, Scott Lurndal wrote:
David Brown <david.brown@hesbynett.no> writes:
On 04/11/2025 18:32, Scott Lurndal wrote:
.˙ (And I have never
seen a Cortex-M device with programmable windows or addresses -
indeed,
I believe the Cortex-M core documentation specifies some memory
ranges
explicitly.
I have used Cortex-M devices with programmable windows
in the physical address space.
OK.˙ I have not, but I haven't used the newer Cortex-M cores as yet, so >>>> it could well be a new feature.
It is not necessarily a feature of the M7 core itself, but rather
the glue logic around it - particularly the logic that interfaces
to the "system bus" to which the M7 core is interfaced.˙˙ That logic
is under the control of the SoC designer and can easily have
external registers that are programmed to specify how to route
accesses from the M7, including to large regions of DRAM;
consider a maintenance processor on a 64-bit server that needs
access to the server DRAM space for RAS purposes.
Fair enough, now I see what you are getting at.˙ Yes, once you are
outside the Cortex-M core and key ARM-supplied components (like the
interrupt controller), you as a SoC designer are free to do what you
like.˙ And if you have a 32-bit processor that needs access to a
64-bit address space, you are going to have to do some kind of
windowing or segmenting.
In the SoC's I have used where 64-bit Cortex-A processors are combined
with a Cortex-M core for security purposes, booting, or for better
real- time control of peripherals, the Cortex-M device does not have
direct access to the 64-bit memory space.˙ It has access to the
peripherals, some dedicated memory, and a message-passing interface
with the Cortex-A cores.
But in your work, you probably see more variety and more possibilities
for these things - I only get to use the chips someone else has made!
I think you were right, if this 'M7' chip doesn't directly have
registers, instructions or infrastructure to access the more complex
memory system.
Unless you are modifying M7 itself, then that 'glue' logic could be
applied to anything (eg. I've built a Z80 system with 256KB RAM), and it
is that composite system that a language + compiler can target.
Then it would appear to the use of the language that the target machine
had those extended features. But if they were to look at the generated
code, they might see it was accessing external registers or whatever.
So it's cheating.
On 06/11/2025 12:21, bart wrote:
On 06/11/2025 07:51, David Brown wrote:
On 05/11/2025 16:15, Scott Lurndal wrote:
David Brown <david.brown@hesbynett.no> writes:
On 04/11/2025 23:04, Scott Lurndal wrote:
David Brown <david.brown@hesbynett.no> writes:
On 04/11/2025 18:32, Scott Lurndal wrote:
.? (And I have never
seen a Cortex-M device with programmable windows or addresses
- indeed,
I believe the Cortex-M core documentation specifies some
memory ranges
explicitly.
I have used Cortex-M devices with programmable windows
in the physical address space.
OK.? I have not, but I haven't used the newer Cortex-M cores as
yet, so it could well be a new feature.
It is not necessarily a feature of the M7 core itself, but rather
the glue logic around it - particularly the logic that interfaces
to the "system bus" to which the M7 core is interfaced.?? That
logic is under the control of the SoC designer and can easily have
external registers that are programmed to specify how to route
accesses from the M7, including to large regions of DRAM;
consider a maintenance processor on a 64-bit server that needs
access to the server DRAM space for RAS purposes.
Fair enough, now I see what you are getting at.? Yes, once you are
outside the Cortex-M core and key ARM-supplied components (like
the interrupt controller), you as a SoC designer are free to do
what you like.? And if you have a 32-bit processor that needs
access to a 64-bit address space, you are going to have to do some
kind of windowing or segmenting.
In the SoC's I have used where 64-bit Cortex-A processors are
combined with a Cortex-M core for security purposes, booting, or
for better real- time control of peripherals, the Cortex-M device
does not have direct access to the 64-bit memory space.? It has
access to the peripherals, some dedicated memory, and a
message-passing interface with the Cortex-A cores.
But in your work, you probably see more variety and more
possibilities for these things - I only get to use the chips
someone else has made!
I think you were right, if this 'M7' chip doesn't directly have
registers, instructions or infrastructure to access the more
complex memory system.
Unless you are modifying M7 itself, then that 'glue' logic could be applied to anything (eg. I've built a Z80 system with 256KB RAM),
and it is that composite system that a language + compiler can
target.
Then it would appear to the use of the language that the target
machine had those extended features. But if they were to look at
the generated code, they might see it was accessing external
registers or whatever.
So it's cheating.
You were fine up until the last sentence here. What do you mean by "cheating" ? Whose rules is it breaking? The system Scott was
describing (assuming I understood him correctly) let the 32-bit core
access blocks of the 64-bit address space. You can choose which part
of the address space is accessible at any given time (presumably by accessing segment or window registers like any other memory-mapped peripheral registers). But you can't call it "cheating" unless you
have defined some set of rules for what is "allowed" and what is not
allowed, and everyone else has agreed to play by those rules.
On Thu, 6 Nov 2025 13:56:17 +0100
David Brown <david.brown@hesbynett.no> wrote:
On 06/11/2025 12:21, bart wrote:
On 06/11/2025 07:51, David Brown wrote:
On 05/11/2025 16:15, Scott Lurndal wrote:
David Brown <david.brown@hesbynett.no> writes:
On 04/11/2025 23:04, Scott Lurndal wrote:
David Brown <david.brown@hesbynett.no> writes:
On 04/11/2025 18:32, Scott Lurndal wrote:
.˙ (And I have never
seen a Cortex-M device with programmable windows or addresses
- indeed,
I believe the Cortex-M core documentation specifies some
memory ranges
explicitly.
I have used Cortex-M devices with programmable windows
in the physical address space.
OK.˙ I have not, but I haven't used the newer Cortex-M cores as
yet, so it could well be a new feature.
It is not necessarily a feature of the M7 core itself, but rather
the glue logic around it - particularly the logic that interfaces
to the "system bus" to which the M7 core is interfaced.˙˙ That
logic is under the control of the SoC designer and can easily have
external registers that are programmed to specify how to route
accesses from the M7, including to large regions of DRAM;
consider a maintenance processor on a 64-bit server that needs
access to the server DRAM space for RAS purposes.
Fair enough, now I see what you are getting at.˙ Yes, once you are
outside the Cortex-M core and key ARM-supplied components (like
the interrupt controller), you as a SoC designer are free to do
what you like.˙ And if you have a 32-bit processor that needs
access to a 64-bit address space, you are going to have to do some
kind of windowing or segmenting.
In the SoC's I have used where 64-bit Cortex-A processors are
combined with a Cortex-M core for security purposes, booting, or
for better real- time control of peripherals, the Cortex-M device
does not have direct access to the 64-bit memory space.˙ It has
access to the peripherals, some dedicated memory, and a
message-passing interface with the Cortex-A cores.
But in your work, you probably see more variety and more
possibilities for these things - I only get to use the chips
someone else has made!
I think you were right, if this 'M7' chip doesn't directly have
registers, instructions or infrastructure to access the more
complex memory system.
Unless you are modifying M7 itself, then that 'glue' logic could be
applied to anything (eg. I've built a Z80 system with 256KB RAM),
and it is that composite system that a language + compiler can
target.
Then it would appear to the use of the language that the target
machine had those extended features. But if they were to look at
the generated code, they might see it was accessing external
registers or whatever.
So it's cheating.
You were fine up until the last sentence here. What do you mean by
"cheating" ? Whose rules is it breaking? The system Scott was
describing (assuming I understood him correctly) let the 32-bit core
access blocks of the 64-bit address space. You can choose which part
of the address space is accessible at any given time (presumably by
accessing segment or window registers like any other memory-mapped
peripheral registers). But you can't call it "cheating" unless you
have defined some set of rules for what is "allowed" and what is not
allowed, and everyone else has agreed to play by those rules.
Doing this sort of tricks with Cortex-M7 is asking for trouble.
Its
data cache is unaware of tricks you play with windows, so programmer
have to flush/invalidate cache lines manually.
Sooner or later
programmer will make mistake. A mistake of the sort that is very hard to debug.
I'd say, if you (SOC designer) absolutely have to play these games, just
use Cortex-M4.
On Thu, 6 Nov 2025 13:56:17 +0100
David Brown <david.brown@hesbynett.no> wrote:
On 06/11/2025 12:21, bart wrote:
On 06/11/2025 07:51, David Brown wrote: =20=20
On 05/11/2025 16:15, Scott Lurndal wrote: =20=20
David Brown <david.brown@hesbynett.no> writes: =20
On 04/11/2025 23:04, Scott Lurndal wrote: =20
David Brown <david.brown@hesbynett.no> writes: =20
On 04/11/2025 18:32, Scott Lurndal wrote: =20=20
.=A0 (And I have never
seen a Cortex-M device with programmable windows or addresses
- indeed,
I believe the Cortex-M core documentation specifies some
memory ranges
explicitly. =20
I have used Cortex-M devices with programmable windows
in the physical address space. =20
OK.=A0 I have not, but I haven't used the newer Cortex-M cores as
yet, so it could well be a new feature. =20
It is not necessarily a feature of the M7 core itself, but rather
the glue logic around it - particularly the logic that interfaces
to the "system bus" to which the M7 core is interfaced.=A0=A0 That
logic is under the control of the SoC designer and can easily have
external registers that are programmed to specify how to route
accesses from the M7, including to large regions of DRAM;
consider a maintenance processor on a 64-bit server that needs
access to the server DRAM space for RAS purposes.
=20
Fair enough, now I see what you are getting at.=A0 Yes, once you are=20 >> >> outside the Cortex-M core and key ARM-supplied components (like
the interrupt controller), you as a SoC designer are free to do
what you like.=A0 And if you have a 32-bit processor that needs
access to a 64-bit address space, you are going to have to do some
kind of windowing or segmenting.
In the SoC's I have used where 64-bit Cortex-A processors are
combined with a Cortex-M core for security purposes, booting, or
for better real- time control of peripherals, the Cortex-M device
does not have direct access to the 64-bit memory space.=A0 It has
access to the peripherals, some dedicated memory, and a
message-passing interface with the Cortex-A cores.
But in your work, you probably see more variety and more
possibilities for these things - I only get to use the chips
someone else has made!=20
I think you were right, if this 'M7' chip doesn't directly have=20
registers, instructions or infrastructure to access the more
complex memory system.
=20
Unless you are modifying M7 itself, then that 'glue' logic could be=20
applied to anything (eg. I've built a Z80 system with 256KB RAM),
and it is that composite system that a language + compiler can
target.
=20
Then it would appear to the use of the language that the target
machine had those extended features. But if they were to look at
the generated code, they might see it was accessing external
registers or whatever.
=20
So it's cheating. =20
You were fine up until the last sentence here. What do you mean by=20
"cheating" ? Whose rules is it breaking? The system Scott was=20
describing (assuming I understood him correctly) let the 32-bit core=20
access blocks of the 64-bit address space. You can choose which part
of the address space is accessible at any given time (presumably by=20
accessing segment or window registers like any other memory-mapped=20
peripheral registers). But you can't call it "cheating" unless you
have defined some set of rules for what is "allowed" and what is not
allowed, and everyone else has agreed to play by those rules.
=20
=20
Doing this sort of tricks with Cortex-M7 is asking for trouble. Its
data cache is unaware of tricks you play with windows, so programmer
have to flush/invalidate cache lines manually.
On 11/4/25 08:20, Scott Lurndal wrote:
Kaz Kylheku <643-408-1753@kylheku.com> writes:
On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
On 11/3/25 13:24, Lynn McGuire wrote:
When I saw this subject line, I thought it was some necroposting to
threads from 1990.
Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?
There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.
I was thinking, are there any segmented architectures today? Most
disguise segmentation as a flat address space (e.g. IBM System/370 et.seq.)
In article <10eda8d$3pd45$1@dont-email.me>,
Peter Flass <Peter@Iron-Spring.com> wrote:
On 11/4/25 08:20, Scott Lurndal wrote:
Kaz Kylheku <643-408-1753@kylheku.com> writes:
On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
On 11/3/25 13:24, Lynn McGuire wrote:
When I saw this subject line, I thought it was some necroposting to
threads from 1990.
Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?
There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.
I was thinking, are there any segmented architectures today? Most
disguise segmentation as a flat address space (e.g. IBM System/370 et.seq.)
x86_64 is still nominally segmented; what "code segment" the
processor is running in matters, even in long mode. But most of
the segment data is ignored by hardware (e.g., base and limits)
in 64-bit mode.
In article <10eda8d$3pd45$1@dont-email.me>,bit)?
Peter Flass <Peter@Iron-Spring.com> wrote:
On 11/4/25 08:20, Scott Lurndal wrote:
Kaz Kylheku <643-408-1753@kylheku.com> writes:
On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
On 11/3/25 13:24, Lynn McGuire wrote:
When I saw this subject line, I thought it was some necroposting to
threads from 1990.
Someone still cared about segmented x86 shit in 2010 (even if 32
et.seq.)
There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.
I was thinking, are there any segmented architectures today? Most
disguise segmentation as a flat address space (e.g. IBM System/370
x86_64 is still nominally segmented; what "code segment" the
processor is running in matters, even in long mode. But most of
the segment data is ignored by hardware (e.g., base and limits)
in 64-bit mode.
Of course, it retains a notion of segmentation for a) 16- and
32-bit code compatibility, and b) startup, where the processor
(still!!) comes out of reset in 16-bit real mode.
Intel had a proposal to do away with 16-bit mode and anything
other than long mode for 64-bit, but it seems to have died. So
it seems like we'll be stuck with x86 segmentation --- at least
for compatibility purposes --- for a while longer still.
On 11/4/25 12:12, Richard Heathfield wrote:
On 04/11/2025 15:20, Scott Lurndal wrote:
Kaz Kylheku <643-408-1753@kylheku.com> writes:
On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
On 11/3/25 13:24, Lynn McGuire wrote:
When I saw this subject line, I thought it was some necroposting to
threads from 1990.
Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?
There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.
I can still hear them down the hall.
ST!
.......................................................Amiga!
ST!
.......................................................Amiga!
The 68000 was a very nice processor for its time. It's too bad IBM
didn't use it in the PC.
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <10eda8d$3pd45$1@dont-email.me>,
Peter Flass <Peter@Iron-Spring.com> wrote:
On 11/4/25 08:20, Scott Lurndal wrote:x86_64 is still nominally segmented; what "code segment" the
Kaz Kylheku <643-408-1753@kylheku.com> writes:
On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:There are still people on the internet who swear that the 286 is
On 11/3/25 13:24, Lynn McGuire wrote:
When I saw this subject line, I thought it was some necroposting to
threads from 1990.
Someone still cared about segmented x86 shit in 2010 (even if 32 bit)? >>>>
better than sliced bread and refuse to recognize that modern
architectures are superior.
I was thinking, are there any segmented architectures today? Most >>>disguise segmentation as a flat address space (e.g. IBM System/370 et.seq.) >>
processor is running in matters, even in long mode. But most of
the segment data is ignored by hardware (e.g., base and limits)
in 64-bit mode.
Minor correction, an update to AMD64 was done back in
the oughts to support some segment limit registers for 64-bit XEN
(and probably for vmware as well).
See the LMSLE bit in the EFER register for more details.
On Fri, 7 Nov 2025 15:50:53 -0000 (UTC), cross@spitfire.i.gajendra.net
(Dan Cross) wrote:
In article <10eda8d$3pd45$1@dont-email.me>,
Peter Flass <Peter@Iron-Spring.com> wrote:
On 11/4/25 08:20, Scott Lurndal wrote:x86_64 is still nominally segmented; what "code segment" the
Kaz Kylheku <643-408-1753@kylheku.com> writes:
On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:There are still people on the internet who swear that the 286 is
On 11/3/25 13:24, Lynn McGuire wrote:
When I saw this subject line, I thought it was some necroposting to
threads from 1990.
Someone still cared about segmented x86 shit in 2010 (even if 32 bit)? >>>>
better than sliced bread and refuse to recognize that modern
architectures are superior.
I was thinking, are there any segmented architectures today? Most >>>disguise segmentation as a flat address space (e.g. IBM System/370 et.seq.) >>
processor is running in matters, even in long mode. But most of
the segment data is ignored by hardware (e.g., base and limits)
in 64-bit mode.
Of course, it retains a notion of segmentation for a) 16- and
32-bit code compatibility, and b) startup, where the processor
(still!!) comes out of reset in 16-bit real mode.
Intel had a proposal to do away with 16-bit mode and anything
other than long mode for 64-bit, but it seems to have died. So
it seems like we'll be stuck with x86 segmentation --- at least
for compatibility purposes --- for a while longer still.
This is all very interesting as a summary of where-we-are. Thanks.
Didn't Intel, at one time, plan to replace all xxx8x processors with
one of the new! shiny! RISC processor?
Only to be defeated when it was pointed out that a whole lot of
software would have to run on it. Software written for their xxx8x >processors, segmentation and all.
On Fri, 7 Nov 2025 15:50:53 -0000 (UTC), cross@spitfire.i.gajendra.net
(Dan Cross) wrote:
Intel had a proposal to do away with 16-bit mode and anything
other than long mode for 64-bit, but it seems to have died. So
it seems like we'll be stuck with x86 segmentation --- at least
for compatibility purposes --- for a while longer still.
This is all very interesting as a summary of where-we-are. Thanks.
Didn't Intel, at one time, plan to replace all xxx8x processors with
one of the new! shiny! RISC processor?
Didn't Intel, at one time, plan to replace all xxx8x processors with
one of the new! shiny! RISC processor?
Only to be defeated when it was pointed out that a whole lot of
software would have to run on it. Software written for their xxx8x processors, segmentation and all.
On Thu, 6 Nov 2025 13:56:17 +0100
David Brown <david.brown@hesbynett.no> wrote:
On 06/11/2025 12:21, bart wrote:
On 06/11/2025 07:51, David Brown wrote:
On 05/11/2025 16:15, Scott Lurndal wrote:
David Brown <david.brown@hesbynett.no> writes:
On 04/11/2025 23:04, Scott Lurndal wrote:
David Brown <david.brown@hesbynett.no> writes:
On 04/11/2025 18:32, Scott Lurndal wrote:
.? (And I have never
seen a Cortex-M device with programmable windows or addresses
- indeed,
I believe the Cortex-M core documentation specifies some
memory ranges
explicitly.
I have used Cortex-M devices with programmable windows
in the physical address space.
OK.? I have not, but I haven't used the newer Cortex-M cores as
yet, so it could well be a new feature.
It is not necessarily a feature of the M7 core itself, but rather
the glue logic around it - particularly the logic that interfaces
to the "system bus" to which the M7 core is interfaced.?? That
logic is under the control of the SoC designer and can easily have
external registers that are programmed to specify how to route
accesses from the M7, including to large regions of DRAM;
consider a maintenance processor on a 64-bit server that needs
access to the server DRAM space for RAS purposes.
Fair enough, now I see what you are getting at.? Yes, once you are
outside the Cortex-M core and key ARM-supplied components (like
the interrupt controller), you as a SoC designer are free to do
what you like.? And if you have a 32-bit processor that needs
access to a 64-bit address space, you are going to have to do some
kind of windowing or segmenting.
In the SoC's I have used where 64-bit Cortex-A processors are
combined with a Cortex-M core for security purposes, booting, or
for better real- time control of peripherals, the Cortex-M device
does not have direct access to the 64-bit memory space.? It has
access to the peripherals, some dedicated memory, and a
message-passing interface with the Cortex-A cores.
But in your work, you probably see more variety and more
possibilities for these things - I only get to use the chips
someone else has made!
I think you were right, if this 'M7' chip doesn't directly have
registers, instructions or infrastructure to access the more
complex memory system.
Unless you are modifying M7 itself, then that 'glue' logic could be
applied to anything (eg. I've built a Z80 system with 256KB RAM),
and it is that composite system that a language + compiler can
target.
Then it would appear to the use of the language that the target
machine had those extended features. But if they were to look at
the generated code, they might see it was accessing external
registers or whatever.
So it's cheating.
You were fine up until the last sentence here. What do you mean by
"cheating" ? Whose rules is it breaking? The system Scott was
describing (assuming I understood him correctly) let the 32-bit core
access blocks of the 64-bit address space. You can choose which part
of the address space is accessible at any given time (presumably by
accessing segment or window registers like any other memory-mapped
peripheral registers). But you can't call it "cheating" unless you
have defined some set of rules for what is "allowed" and what is not
allowed, and everyone else has agreed to play by those rules.
Doing this sort of tricks with Cortex-M7 is asking for trouble. Its
data cache is unaware of tricks you play with windows, so programmer
have to flush/invalidate cache lines manually. Sooner or later
programmer will make mistake.
A mistake of the sort that is very hard to debug.
I'd say, if you (SOC designer) absolutely have to play these games, just
use Cortex-M4.
In article <10eda8d$3pd45$1@dont-email.me>,
Peter Flass <Peter@Iron-Spring.com> wrote:
On 11/4/25 08:20, Scott Lurndal wrote:
Kaz Kylheku <643-408-1753@kylheku.com> writes:
On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:
On 11/3/25 13:24, Lynn McGuire wrote:
When I saw this subject line, I thought it was some necroposting to
threads from 1990.
Someone still cared about segmented x86 shit in 2010 (even if 32 bit)?
There are still people on the internet who swear that the 286 is
better than sliced bread and refuse to recognize that modern
architectures are superior.
I was thinking, are there any segmented architectures today? Most
disguise segmentation as a flat address space (e.g. IBM System/370 et.seq.)
x86_64 is still nominally segmented; what "code segment" the
processor is running in matters, even in long mode. But most of
the segment data is ignored by hardware (e.g., base and limits)
in 64-bit mode.
Of course, it retains a notion of segmentation for a) 16- and
32-bit code compatibility, and b) startup, where the processor
(still!!) comes out of reset in 16-bit real mode.
Intel had a proposal to do away with 16-bit mode and anything
other than long mode for 64-bit, but it seems to have died. So
it seems like we'll be stuck with x86 segmentation --- at least
for compatibility purposes --- for a while longer still.
- Dan C.
In article <10edcbg$lrh1$1@dont-email.me>,
geodandw <geodandw@gmail.com> wrote:
On 11/4/25 12:12, Richard Heathfield wrote:
On 04/11/2025 15:20, Scott Lurndal wrote:
Kaz Kylheku <643-408-1753@kylheku.com> writes:
On 2025-11-03, Peter Flass <Peter@Iron-Spring.com> wrote:There are still people on the internet who swear that the 286 is
On 11/3/25 13:24, Lynn McGuire wrote:
When I saw this subject line, I thought it was some necroposting to
threads from 1990.
Someone still cared about segmented x86 shit in 2010 (even if 32 bit)? >>>>
better than sliced bread and refuse to recognize that modern
architectures are superior.
I can still hear them down the hall.
ST!
.......................................................Amiga!
ST!
.......................................................Amiga!
The 68000 was a very nice processor for its time. It's too bad IBM
didn't use it in the PC.
They wanted to. IBM had a close relationship with Motorola, and
they even had engineering samples in Westchester. The problem
was that 68k was a skunkworks project inside of Moto, which was
pushing the 6809 as the Next Big Thing. So when IBM was talking
to Moto sales about using 68k for the PC, Moto was pushing them
(not so gently) towards the 6809 and telling them 68k was just a
research project with no future.
IBM was smart enough to know that the 6809 was going to be a
non-starter (a firmly 8-bit micro when 16-bit CPUs were becoming
mainstream), and the 8088 met their specs for the 5150, so they
went with Intel instead. By the time it was clear that the 68k
was going to be Moto's flagship CPU going forward, it was too
late for inclusion in the PC.
And here we are.
- Dan C.
I think they used the 680x0 in one of their small computers. Maybe the >"Laboratory Computer"?
the architecture. Meanwhile, a bunch of ex-DEC people went to
AMD and did the AMD64 extensions for x86, which a) performed
I'd say, if you (SOC designer) absolutely have to play these games,
just use Cortex-M4.
Sometimes you really do need an M7 class part.
- Dan C.
On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) wrote:
I'd say, if you (SOC designer) absolutely have to play these games,
just use Cortex-M4.
Sometimes you really do need an M7 class part.
- Dan C.
Somehow I suspect that [at the same clock frequency] M4 could access
uncached memory faster that M7. May be, even significantly faster.
Unfortunately, info about M7 instructions timing does not appear to be public.
If one needs something like DP floating or when uncached accesses are
only small part of the job and the rest of the load is compute
-intensive then I can see how M7 could look attractive vs M4.
But personally in such case I'd start to look for non-Cortex-M solution.
May be R4, although I don't like it. May be A5. In huge SoCs of sort
Scott is working on - A34 or even 510. Plus, another M4 to handle more typical MCU tasks.
On 09/11/2025 10:46, Michael S wrote:
On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) wrote:
I'd say, if you (SOC designer) absolutely have to play these
games, just use Cortex-M4.
Sometimes you really do need an M7 class part.
- Dan C.
Somehow I suspect that [at the same clock frequency] M4 could access uncached memory faster that M7. May be, even significantly faster.
I suspect you would be wrong. The M7 can do more per clock than the
M4, has wider buses, and has support for direct data and instruction memories with their own dedicated buses.
I can appreciate the gut
feeling that because there is the option of caching accesses, that
extra functionality may slow down accesses when the cache is not
used, but I don't believe that happens on the M7. And everything
other than the accesses themselves (the loads, stores, address
increments, looping, etc.) can be quite a lot faster at the same
clock speed.
But as you say, public data on timings is limited -
and even when the
data on the core is available, timings can be very dependent on
details of the implementation and connections outside the core.
We could always appeal to authority - Scott's company knows what they
are doing, have access to far more detailed information and technical assistance from ARM than we do, and have picked an M7 rather than an
M4. But speculation is more fun :-)
Unfortunately, info about M7 instructions timing does not appear to
be public.
If one needs something like DP floating or when uncached accesses
are only small part of the job and the rest of the load is compute -intensive then I can see how M7 could look attractive vs M4.
But personally in such case I'd start to look for non-Cortex-M
solution. May be R4, although I don't like it. May be A5. In huge
SoCs of sort Scott is working on - A34 or even 510. Plus, another
M4 to handle more typical MCU tasks.
On Sun, 9 Nov 2025 12:29:32 +0100
David Brown <david.brown@hesbynett.no> wrote:
On 09/11/2025 10:46, Michael S wrote:
On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) wrote:
I'd say, if you (SOC designer) absolutely have to play these
games, just use Cortex-M4.
Sometimes you really do need an M7 class part.
- Dan C.
Somehow I suspect that [at the same clock frequency] M4 could access
uncached memory faster that M7. May be, even significantly faster.
I suspect you would be wrong. The M7 can do more per clock than the
M4, has wider buses, and has support for direct data and instruction
memories with their own dedicated buses.
If I am not mistaken, with exception of caches, M4 and M7 have
3 identical "fast" 32-bit busses - I+D+AHB. Plus some slower auxiliary
stuff.
I can appreciate the gut
feeling that because there is the option of caching accesses, that
extra functionality may slow down accesses when the cache is not
used, but I don't believe that happens on the M7. And everything
other than the accesses themselves (the loads, stores, address
increments, looping, etc.) can be quite a lot faster at the same
clock speed.
Except that every branch mispredict is more than twice slower.
I'd
guess that the latency of the cache/TCM *hit* is also 1 clock slower
that latency of internal SRAM access on M4, but absence of docs
prevents me from proving it.
As to cache miss, I am pretty sure that it completely stalls M7
pipeline.
In case of M4, I think that after external Load pipeline makes
one more step before it stalls. And, of course, the stall itself is less expensive.
Once again, I can't prove it because of absence of docs.
But as you say, public data on timings is limited -
In case of M7, public data is not "limited", it is absent.
AFAIK, it's not the case for all other Cortex-M cores. Back when M7 was
new, Arm claimed that the data is not made available because the core
is more complicated that the rest of Cortex-M line. As silly as it
sounds they could continue to claim it with sort of straight face for as
long as other Cortex-M cores were, indeed, simpler. Which is not the
case since 2022, because Cortex M85 is no less complicated than M7 and arguably even a little more so. Despite that, there exist M85 Software Optimization Guide that contains instruction tables with latency and throughput data. Yes, it has few omissions, but it proves that there is nothing impossible in documenting cores of this level of complexity,
even if you as lazy as Cortex M documentation team appears to be
(relatively, for example, to Cortex-A/Neoverse side of the company).
and even when the
data on the core is available, timings can be very dependent on
details of the implementation and connections outside the core.
We could always appeal to authority - Scott's company knows what they
are doing, have access to far more detailed information and technical
assistance from ARM than we do, and have picked an M7 rather than an
M4. But speculation is more fun :-)
Unfortunately, info about M7 instructions timing does not appear to
be public.
If one needs something like DP floating or when uncached accesses
are only small part of the job and the rest of the load is compute
-intensive then I can see how M7 could look attractive vs M4.
But personally in such case I'd start to look for non-Cortex-M
solution. May be R4, although I don't like it. May be A5. In huge
SoCs of sort Scott is working on - A34 or even 510. Plus, another
M4 to handle more typical MCU tasks.
On 09/11/2025 13:40, Michael S wrote:
On Sun, 9 Nov 2025 12:29:32 +0100
David Brown <david.brown@hesbynett.no> wrote:
On 09/11/2025 10:46, Michael S wrote:
On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) wrote:
I'd say, if you (SOC designer) absolutely have to play these
games, just use Cortex-M4.
Sometimes you really do need an M7 class part.
- Dan C.
Somehow I suspect that [at the same clock frequency] M4 could
access uncached memory faster that M7. May be, even significantly
faster.
I suspect you would be wrong. The M7 can do more per clock than
the M4, has wider buses, and has support for direct data and
instruction memories with their own dedicated buses.
If I am not mistaken, with exception of caches, M4 and M7 have
3 identical "fast" 32-bit busses - I+D+AHB. Plus some slower
auxiliary stuff.
I believe you are mistaken (which is not something I have seen often).
<https://www.arm.com/-/media/Arm%20Developer%20Community/PDF/Processor%20Datasheets/Arm-Cortex-M7-Processor-Datasheet.pdf>
"""
The interfaces that the processor supports include:
64-bit AXI4 interface
32-bit AHB master interface
32-bit AHB slave interface
64-bit instruction TCM interface
2x32-bit data TCM interfaces
"""
The M7 is dual issue - for some instruction combinations, it runs two instructions per clock. It needs more, faster and wider buses to
feed it.
I can appreciate the gut
feeling that because there is the option of caching accesses, that
extra functionality may slow down accesses when the cache is not
used, but I don't believe that happens on the M7. And everything
other than the accesses themselves (the loads, stores, address
increments, looping, etc.) can be quite a lot faster at the same
clock speed.
Except that every branch mispredict is more than twice slower.
Branch mispredict costs are primarily related to pipeline depth on a processor that does not do any kind of speculative execution.
I
don't remember the depth of the M4 and M7 off-hand, but the M7 is not
twice as deep as the M4.
I'd
guess that the latency of the cache/TCM *hit* is also 1 clock slower
that latency of internal SRAM access on M4, but absence of docs
prevents me from proving it.
The whole point of the TCM - tightly coupled memories - is that they
run at core speed, and no caches are used. They are as low-latency
as can be achieved with M4 sram, except that now you have independent
buses and memory for instruction and data (rather than independent
buses to shared memory if you have code and data in ram on most M4 implementations), and that the buses are twice as wide.
It is possible that there is an extra cycle of latency on accessing
main memory, due to the optional path through the cache - I am not
sure on that. But I suspect that the 64-bit wide AXI4 bus, as well
as the significantly faster handling of the rest of the code (which
does not need to share the same bus bandwidth as the off-core memory accesses) more than outweighs that.
As to cache miss, I am pretty sure that it completely stalls M7
pipeline.
Yes. But we are not using the cache in this hypothetical case.
On Sun, 9 Nov 2025 15:54:20 +0100
David Brown <david.brown@hesbynett.no> wrote:
On 09/11/2025 13:40, Michael S wrote:
On Sun, 9 Nov 2025 12:29:32 +0100
David Brown <david.brown@hesbynett.no> wrote:
On 09/11/2025 10:46, Michael S wrote:
On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) wrote:
I'd say, if you (SOC designer) absolutely have to play these
games, just use Cortex-M4.
Sometimes you really do need an M7 class part.
- Dan C.
Somehow I suspect that [at the same clock frequency] M4 could
access uncached memory faster that M7. May be, even significantly
faster.
I suspect you would be wrong. The M7 can do more per clock than
the M4, has wider buses, and has support for direct data and
instruction memories with their own dedicated buses.
If I am not mistaken, with exception of caches, M4 and M7 have
3 identical "fast" 32-bit busses - I+D+AHB. Plus some slower
auxiliary stuff.
I believe you are mistaken (which is not something I have seen often).
<https://www.arm.com/-/media/Arm%20Developer%20Community/PDF/Processor%20Datasheets/Arm-Cortex-M7-Processor-Datasheet.pdf>
"""
The interfaces that the processor supports include:
64-bit AXI4 interface
32-bit AHB master interface
32-bit AHB slave interface
64-bit instruction TCM interface
2x32-bit data TCM interfaces
"""
Yes, I was mistaken. I overlooked AXIM/AXI4.
The M7 is dual issue - for some instruction combinations, it runs two
instructions per clock. It needs more, faster and wider buses to
feed it.
I can appreciate the gut
feeling that because there is the option of caching accesses, that
extra functionality may slow down accesses when the cache is not
used, but I don't believe that happens on the M7. And everything
other than the accesses themselves (the loads, stores, address
increments, looping, etc.) can be quite a lot faster at the same
clock speed.
Except that every branch mispredict is more than twice slower.
Branch mispredict costs are primarily related to pipeline depth on a
processor that does not do any kind of speculative execution.
Same as on most of those those that do speculative execution. But
that's O.T.
I
don't remember the depth of the M4 and M7 off-hand, but the M7 is not
twice as deep as the M4.
It is twice as deep. 6 vs 3. Which means that typical mispredict penalty differs by factor of 2.5 (5 vs 2).
I'd
guess that the latency of the cache/TCM *hit* is also 1 clock slower
that latency of internal SRAM access on M4, but absence of docs
prevents me from proving it.
The whole point of the TCM - tightly coupled memories - is that they
run at core speed, and no caches are used. They are as low-latency
as can be achieved with M4 sram, except that now you have independent
buses and memory for instruction and data (rather than independent
buses to shared memory if you have code and data in ram on most M4
implementations), and that the buses are twice as wide.
Look at the pipelines.
We have no official pipeline picture for M7, but we can guess with
good certainty that it is very similar to M85, with main difference
being that M85 has 3 LS stages and M7 has only 2.
It is obvious that in the best possible case Load instruction and
dependent Integer Data Processing (DPU) instruction have to be 2 cycles apart, i.e. minimum load to DPU latency = 3. On M3/M4 minimal latency =
2.
It is possible that there is an extra cycle of latency on accessing
main memory, due to the optional path through the cache - I am not
sure on that. But I suspect that the 64-bit wide AXI4 bus, as well
as the significantly faster handling of the rest of the code (which
does not need to share the same bus bandwidth as the off-core memory
accesses) more than outweighs that.
64-bit bus certainly helps a lot for cached accesses. Not sure if it
helps uncached accesses.
I'd guess that [for uncached) it does not help
regular integer Load instructions, but sometime helps LDM. It also
likely helps DP FP load instruction when the core configured with DP
FPU.
As to sharing the same bus bandwith, both M4 and M7 have dedicated
I-bus. In typical MCU it is connected to NOR flash and here M7 Icache
helps a lot. In typical big ASIC it is connected to fast SRAM and
Icache makes no difference.
As to cache miss, I am pretty sure that it completely stalls M7
pipeline.
Yes. But we are not using the cache in this hypothetical case.
As far as pipeline goes, uncached access is the same as D-cache miss.
Except that after data finally arrived it does not have to be written
to cache, but for load-to-use latency the latter is irrelevant.
May be, it is one clock better when M7 is configured without data cache, which is possible and fully supported by ARM, but probably not very
popular among their clients. Or, may be, it's not better.
On soft Nios2-f core which is M7-class core I am most familiar with,
uncached configuration does help, but internals of soft cores are,
well ... more soft.
On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) wrote:
I'd say, if you (SOC designer) absolutely have to play these games,
just use Cortex-M4.
Sometimes you really do need an M7 class part.
- Dan C.
Somehow I suspect that [at the same clock frequency] M4 could access
uncached memory faster that M7. May be, even significantly faster.
Unfortunately, info about M7 instructions timing does not appear to be >public.
If one needs something like DP floating or when uncached accesses are
only small part of the job and the rest of the load is compute
-intensive then I can see how M7 could look attractive vs M4.
But personally in such case I'd start to look for non-Cortex-M solution.
May be R4, although I don't like it. May be A5. In huge SoCs of sort
Scott is working on - A34 or even 510. Plus, another M4 to handle more >typical MCU tasks.
On Fri, 7 Nov 2025 17:22:51 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) wrote:
the architecture. Meanwhile, a bunch of ex-DEC people went to
AMD and did the AMD64 extensions for x86, which a) performed
Do you have a proof that it was done by Ex-DEC people?
My impression is that Ex-DEC people, esp. Jim Keller, were very
important as micro-architects of K7 and K8, but I don't remember ever
reading that they played major role in the stage of architectural
definitions of AMD64.
| Sysop: | Tetrazocine |
|---|---|
| Location: | Melbourne, VIC, Australia |
| Users: | 14 |
| Nodes: | 8 (0 / 8) |
| Uptime: | 237:52:57 |
| Calls: | 184 |
| Files: | 21,502 |
| Messages: | 82,415 |