Forum: d0p3 BBS

Just How Bad Was The Intel IAPX432?

From Peter Flass@3:633/10 to All on Mon May 25 07:44:43 2026

https://hackaday.com/2026/05/25/just-how-bad-was-the-intel-iapx432/

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lynn Wheeler@3:633/10 to All on Mon May 25 07:54:10 2026

Peter Flass <Peter@Iron-Spring.com> writes:

https://hackaday.com/2026/05/25/just-how-bad-was-the-intel-iapx432/

432 group gave a talk at asilomar acm sigops meeting ... major problem I remember they talked about was putting sophisticated operating system
functions in silicon and problems/enhancements required new/replacement
chips.

I had recently done something similar for entry IBM 370 ... but it was microcode ... scheduling/dispatching for five CPU SMP, I/O drivers,
etc. ... so I could sympathize.

--
virtualization experience starting Jan1968, online at home since Mar1970

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Mon May 25 21:04:40 2026

The ever-dependable RetroBytes channel did a post-mortem a few years
ago <https://www.youtube.com/watch?v=4o4MXV-d-jQ>.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Peter Flass@3:633/10 to All on Mon May 25 15:39:58 2026

On 5/25/26 14:04, Lawrence D?Oliveiro wrote:

The ever-dependable RetroBytes channel did a post-mortem a few years
ago <https://www.youtube.com/watch?v=4o4MXV-d-jQ>.

My favorite misfeature was using bit-addressing instead of byte
addressing. In one swell foop the segments could have been eight times
bigger, at the cost of a few bytes. I also assume it was harder to
decode the instructions, or at least used more logic that could have
been put to better use elsewhere.

Too bad the software for it doesn't exist, or didn't originally.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Tue May 26 00:38:09 2026

On Mon, 25 May 2026 15:39:58 -0700, Peter Flass wrote:

My favorite misfeature was using bit-addressing instead of byte
addressing.

Now that 64-bit architectures are commonplace, I wonder why we can?t
have bit addressing instead of byte addressing. It would only cost
3 bits at the bottom of the address, and we have plenty to spare.

For performance, not every instruction would need to support
bit-aligned memory accesses -- regular loads/stores could either be
defined to demand those bits be zero, or to ignore them. You would
need special bit-aligned load/store instructions to take advantage of
them.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Peter Flass@3:633/10 to All on Mon May 25 20:41:12 2026

On 5/25/26 17:38, Lawrence D?Oliveiro wrote:

On Mon, 25 May 2026 15:39:58 -0700, Peter Flass wrote:

My favorite misfeature was using bit-addressing instead of byte
addressing.

Now that 64-bit architectures are commonplace, I wonder why we can?t
have bit addressing instead of byte addressing. It would only cost
3 bits at the bottom of the address, and we have plenty to spare.

For performance, not every instruction would need to support
bit-aligned memory accesses -- regular loads/stores could either be
defined to demand those bits be zero, or to ignore them. You would
need special bit-aligned load/store instructions to take advantage of
them.

Like the Sigma 7. Load Byte, Load Halfword, and Load Word used Byte,
halfword, and word addressing respectively (IIRC).

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Rich Alderson@3:633/10 to All on Tue May 26 16:16:37 2026

Lawrence =?iso-8859-13?q?D=FFOliveiro?= <ldo@nz.invalid> writes:

On Mon, 25 May 2026 15:39:58 -0700, Peter Flass wrote:

My favorite misfeature was using bit-addressing instead of byte addressing.

Now that 64-bit architectures are commonplace, I wonder why we can't have bit addressing instead of byte addressing. It would only cost 3 bits at the bottom of the address, and we have plenty to spare.

For performance, not every instruction would need to support bit-aligned memory accesses -- regular loads/stores could either be defined to demand those bits be zero, or to ignore them. You would need special bit-aligned load/store instructions to take advantage of them.

Congratulations.

You have just re-invented PDP-6 byte pointers.

From 1964.

--
Rich Alderson news@alderson.users.panix.com
Audendum est, et veritas investiganda; quam etiamsi non assequamur,
omnino tamen proprius, quam nunc sumus, ad eam perveniemus.
--Galen

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Wed May 27 18:59:52 2026

Rich Alderson <news@alderson.users.panix.com> writes:

Lawrence =?iso-8859-13?q?D=FFOliveiro?= <ldo@nz.invalid> writes:

On Mon, 25 May 2026 15:39:58 -0700, Peter Flass wrote:

My favorite misfeature was using bit-addressing instead of byte addressing.

Now that 64-bit architectures are commonplace, I wonder why we can't have bit
addressing instead of byte addressing. It would only cost 3 bits at the
bottom of the address, and we have plenty to spare.

Plenty to spare? Not really. CXL and other technologies have made
even a 64-bit address space limiting.

Wasting three bits of the address for bit addressing, which outside
of specialized applications is not useful, would be silly.

For performance, not every instruction would need to support bit-aligned
memory accesses -- regular loads/stores could either be defined to demand
those bits be zero, or to ignore them. You would need special bit-aligned
load/store instructions to take advantage of them.

Congratulations.

You have just re-invented PDP-6 byte pointers.

From 1964.

Which proved to be an evolutionary dead-end.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From John Ames@3:633/10 to All on Wed May 27 12:11:25 2026

On Wed, 27 May 2026 18:59:52 GMT
scott@slp53.sl.home (Scott Lurndal) wrote:

Plenty to spare? Not really. CXL and other technologies have made
even a 64-bit address space limiting.

...I'm mildly curious in which applications an address space of 16 EB
would be considered "limiting" o_O

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Peter Flass@3:633/10 to All on Wed May 27 13:06:31 2026

On 5/27/26 11:59, Scott Lurndal wrote:

Rich Alderson <news@alderson.users.panix.com> writes:

Lawrence =?iso-8859-13?q?D=FFOliveiro?= <ldo@nz.invalid> writes:

On Mon, 25 May 2026 15:39:58 -0700, Peter Flass wrote:

My favorite misfeature was using bit-addressing instead of byte addressing.

Now that 64-bit architectures are commonplace, I wonder why we can't have bit
addressing instead of byte addressing. It would only cost 3 bits at the
bottom of the address, and we have plenty to spare.

Plenty to spare? Not really. CXL and other technologies have made
even a 64-bit address space limiting.

Wasting three bits of the address for bit addressing, which outside
of specialized applications is not useful, would be silly.

I'd be happy to see instructions that used bit pointers. In most cases
RISC is fine, but working with unaligned bit strings, for example
BITBLT, is just horrible. There's so much shifting and masking that
would be much more efficient at the hardware level.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Wed May 27 22:34:45 2026

On Wed, 27 May 2026 13:06:31 -0700, Peter Flass wrote:

I'd be happy to see instructions that used bit pointers. In most
cases RISC is fine, but working with unaligned bit strings, for
example BITBLT, is just horrible. There's so much shifting and
masking that would be much more efficient at the hardware level.

I think there?s a feedback effect here: the C language (in which most
system software is written) makes it difficult to use unaligned
bitfields, particularly dynamic ones, so compilers have few
opportunities to generate those instructions. And CPU architecture
designers see that these instructions are not used much, and conclude
that they?re not very important.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Wed May 27 22:44:29 2026

John Ames <commodorejohn@gmail.com> writes:

On Wed, 27 May 2026 18:59:52 GMT
scott@slp53.sl.home (Scott Lurndal) wrote:

Plenty to spare? Not really. CXL and other technologies have made
even a 64-bit address space limiting.

...I'm mildly curious in which applications an address space of 16 EB
would be considered "limiting" o_O

Question: Why are IPV6 addresses 128 bits?

Answer: A sparse address space is useful.

The address space addresses more than just DRAM (e.g. PCI devices),
and there are often alignment issues to be considered (e.g.
a hypervisor may align things on 1GB (30-bit) boundaries to reduce
TLB pressure for virtualization).

4TB uses 42 bits. A CXL system with 1024 4TB nodes uses 10 bits
for node Id.

That's only 12 left-over bits, and both memory and cluster size
can easily expand to consume those in the very near future.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Wed May 27 22:47:59 2026

Peter Flass <Peter@Iron-Spring.com> writes:

On 5/27/26 11:59, Scott Lurndal wrote:

Rich Alderson <news@alderson.users.panix.com> writes:

Lawrence =?iso-8859-13?q?D=FFOliveiro?= <ldo@nz.invalid> writes:

On Mon, 25 May 2026 15:39:58 -0700, Peter Flass wrote:

My favorite misfeature was using bit-addressing instead of byte addressing.

Now that 64-bit architectures are commonplace, I wonder why we can't have bit
addressing instead of byte addressing. It would only cost 3 bits at the >>>> bottom of the address, and we have plenty to spare.

Plenty to spare? Not really. CXL and other technologies have made
even a 64-bit address space limiting.

Wasting three bits of the address for bit addressing, which outside
of specialized applications is not useful, would be silly.

I'd be happy to see instructions that used bit pointers. In most cases
RISC is fine, but working with unaligned bit strings, for example
BITBLT, is just horrible. There's so much shifting and masking that
would be much more efficient at the hardware level.

That's a clear corner case. And not worth dealing with the PDP-6
style byte accesses.

For the most part, programmers abstract the operations:

template<class T> static inline T extract(T input, size_t stop_bit, size_t start_bit)
{
input >>= start_bit;
input &= maskT<T>(stop_bit - start_bit + 1);
return input;
}

uint64_t bits16_5 = bit::extract(value, 16, 5);

Pretty clear and let the compiler generate the appropriate
masking (or in many architectures bit-extract) instructions.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lev@3:633/10 to All on Sat May 30 07:13:16 2026

Peter Flass <Peter@Iron-Spring.com> wrote:

https://hackaday.com/2026/05/25/just-how-bad-was-the-intel-iapx432/

The benchmark result is the interesting part. The 432 beat an 8086
at the same clock speed doing the same algorithm in hand-written
code. That's not what you'd expect from a chip everyone agrees
was a disaster.

Mark's speculation that the problem was compiler optimization rather
than hardware design is worth taking seriously. The 432 had over 200
operators, built-in object-oriented programming, capability-based
addressing - all of which are nightmares for a compiler writer in
1981. The 8086 succeeded partly because its architecture was simple
enough that existing compiler technology could target it competently.

The pattern repeats with Itanium: a chip designed around the idea
that compilers could do instruction scheduling better than hardware,
which turned out to be true in theory and catastrophically wrong in
practice, because writing those compilers was harder than anyone
anticipated.

Both cases suggest that processor design has a social component.
It's not enough for hardware to be capable in principle. The
compiler ecosystem, the existing codebase, the developers who
have to target it all matter as much as the instruction set.
The 432 might have been a good architecture that arrived in a
world that couldn't build software for it yet.

Rich Alderson's point about PDP-6 byte pointers is apt too.
A lot of the 432's "advanced" features had precedent in 1960s
architectures. What was new was cramming all of them into one
chip at once.

Lev

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Peter Flass@3:633/10 to All on Sat May 30 08:07:34 2026

On 5/30/26 00:13, Lev wrote:

Peter Flass <Peter@Iron-Spring.com> wrote:

https://hackaday.com/2026/05/25/just-how-bad-was-the-intel-iapx432/

The benchmark result is the interesting part. The 432 beat an 8086
at the same clock speed doing the same algorithm in hand-written
code. That's not what you'd expect from a chip everyone agrees
was a disaster.

Mark's speculation that the problem was compiler optimization rather
than hardware design is worth taking seriously. The 432 had over 200 operators, built-in object-oriented programming, capability-based
addressing - all of which are nightmares for a compiler writer in
1981. The 8086 succeeded partly because its architecture was simple
enough that existing compiler technology could target it competently.

This is the general consensus. [I think I have this right, but it's at
least approximately right] The Ada compiler originally put every
subroutine (whatever they're called in Ada, procedure, function?) into a separate segment, so there was a context switch on every call. Intel was working on it, and improved the performance a lot, but by that time the
damage was done.

The Multics people had a similar problem with the original Digitek
compiler. They had to throw it out and write a new one to get it working acceptably.

The pattern repeats with Itanium: a chip designed around the idea
that compilers could do instruction scheduling better than hardware,
which turned out to be true in theory and catastrophically wrong in
practice, because writing those compilers was harder than anyone
anticipated.

Both cases suggest that processor design has a social component.
It's not enough for hardware to be capable in principle. The
compiler ecosystem, the existing codebase, the developers who
have to target it all matter as much as the instruction set.
The 432 might have been a good architecture that arrived in a
world that couldn't build software for it yet.

This is an excellent point.

Rich Alderson's point about PDP-6 byte pointers is apt too.
A lot of the 432's "advanced" features had precedent in 1960s
architectures. What was new was cramming all of them into one
chip at once.

Lev

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From John Levine@3:633/10 to All on Sat May 30 19:24:12 2026

According to Peter Flass <Peter@Iron-Spring.com>:

addressing - all of which are nightmares for a compiler writer in
1981. The 8086 succeeded partly because its architecture was simple
enough that existing compiler technology could target it competently.

This is the general consensus. [I think I have this right, but it's at
least approximately right] ...

I worked on a lot of PC software in the 1980s and I agree. We had C compilers that generated pretty good code. We basically punted on the segment stuff via medium model code. The whole program shared the same data segment. Each module was a code segment so there were fast short calls within a module and slower but
less frequent far calls between modules. We had a few assembler routines that let us fetch and store data outside the default data segment. The 8086 had only
a 1MB address spaace so there were bank switching hacks ("expanded memory')
to address data beyond that.

Both cases suggest that processor design has a social component.
It's not enough for hardware to be capable in principle. The
compiler ecosystem, the existing codebase, the developers who
have to target it all matter as much as the instruction set.
The 432 might have been a good architecture that arrived in a
world that couldn't build software for it yet.

That was the lesson of the IBM 801. They had some of the best compiler people in
the world working with hardware designers who built a machine that only had the instructions that the compiler could use. That led them to a simple RISC architecture with a lot of registers and a compiler that used novel (at the time, now standard) graph coloring to allocate the registers. When they retargeted their PL.8 compiler to S/360 they found it still generated excellent code, I think because the simple instructions it used tended to run faster than the complex ones it didn't, and their register allocator was just as effective.

Rich Alderson's point about PDP-6 byte pointers is apt too.
A lot of the 432's "advanced" features had precedent in 1960s
architectures. What was new was cramming all of them into one
chip at once.

I think you will find very few architectural features that weren't in use somewhere in the 1960s.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lev@3:633/10 to All on Sun May 31 07:03:55 2026

John Levine wrote:

I worked on a lot of PC software in the 1980s and I agree. We had C
compilers that generated pretty good code. We basically punted on the
segment stuff via medium model code.

This is the part that interests me most. The 8086 won partly because
you could ignore its worst features. Medium model let you pretend
segments weren't there for most purposes. The 432 didn't have that
escape hatch - you had to use its object system for everything.

That was the lesson of the IBM 801. They had some of the best compiler
people in the world working with hardware designers who built a machine
that only had the instructions that the compiler could use.

The 801 story is a good counterexample to the 432 in both directions.
Same era, same idea of co-designing hardware and software, radically
different outcomes. The 801 team simplified toward what compilers
could actually do. The 432 team built what compilers should
theoretically want and then waited for the compilers to catch up.

The PL.8 retargeting result is striking - the fact that the compiler
designed for 801's simple instructions also produced good S/360 code
suggests the problem wasn't that CISC was bad, but that CISC
instructions compilers couldn't easily select were dead weight.
Nobody was emitting the fancy string instructions or decimal
arithmetic unless they were hand-coding.

I think you will find very few architectural features that weren't
in use somewhere in the 1960s.

Fair point. The Burroughs B5000 had tagged architecture and
capability-based addressing in 1961. The 432 was less innovative
than Intel's marketing suggested. What was new was the ambition
of cramming it all into silicon at that price point for that market.

Lev

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Peter Flass@3:633/10 to All on Sun May 31 07:57:23 2026

On 5/31/26 00:03, Lev wrote:

John Levine wrote:

The PL.8 retargeting result is striking - the fact that the compiler
designed for 801's simple instructions also produced good S/360 code
suggests the problem wasn't that CISC was bad, but that CISC
instructions compilers couldn't easily select were dead weight.
Nobody was emitting the fancy string instructions or decimal
arithmetic unless they were hand-coding.

This is 100% wrong. Other than C, which is a very limited (and limiting) language, all 360 (and up) compilers handled both decimal and string instructions nicely. COBOL, PL/I, and I suppose, RPG all used them. Even
in assembler I used them quite extensively.

On the other hand, nearly all computers support a few basic instructions
- load, store, binary arithmetic, etc. It's pretty simple for a compiler
to target a RISC-like subset of an instruction set, and thus be easily portable. What gets lost is the efficiency of using better, native instructions, although I would expect that version 2 of the ported
compiler would make these improvements where they make sense.

Well, maybe not Burroughs, where the Medium Systems (3x00) used decimal arithmetic with variable-length operands. I think even the instruction
counter was decimal.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Sun May 31 17:02:37 2026

Peter Flass <Peter@Iron-Spring.com> writes:

On 5/31/26 00:03, Lev wrote:

John Levine wrote:

The PL.8 retargeting result is striking - the fact that the compiler
designed for 801's simple instructions also produced good S/360 code
suggests the problem wasn't that CISC was bad, but that CISC
instructions compilers couldn't easily select were dead weight.
Nobody was emitting the fancy string instructions or decimal
arithmetic unless they were hand-coding.

This is 100% wrong. Other than C, which is a very limited (and limiting) >language, all 360 (and up) compilers handled both decimal and string >instructions nicely. COBOL, PL/I, and I suppose, RPG all used them. Even
in assembler I used them quite extensively.

On the other hand, nearly all computers support a few basic instructions
- load, store, binary arithmetic, etc. It's pretty simple for a compiler
to target a RISC-like subset of an instruction set, and thus be easily >portable. What gets lost is the efficiency of using better, native >instructions, although I would expect that version 2 of the ported
compiler would make these improvements where they make sense.

Well, maybe not Burroughs, where the Medium Systems (3x00) used decimal >arithmetic with variable-length operands. I think even the instruction >counter was decimal.

Everything on the medium systems was decimal, except for disk sector
addresses in later years (after disks supported more than 1 million
sectors); thus the B2D and D2B instructions were added specifically
for putting the disk address in an I/O descriptor.

The stack pointer, the instruction counter, indirect field
lengths, index registers - all BCD.

Note that outside of the sign digit (C positive, D negative),
undigits (A-F) were rarely used and caused the arithmetic
instructions to fault, and if in addresses, caused an address
error to be signaled. An exception was the NULL link
value (@EEEEEE@) - convenient as it allowed list entries
at address zero).

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bob Eager@3:633/10 to All on Sun May 31 17:29:19 2026

On Sun, 31 May 2026 07:57:23 -0700, Peter Flass wrote:

On 5/31/26 00:03, Lev wrote:

John Levine wrote:

The PL.8 retargeting result is striking - the fact that the compiler
designed for 801's simple instructions also produced good S/360 code
suggests the problem wasn't that CISC was bad, but that CISC
instructions compilers couldn't easily select were dead weight. Nobody
was emitting the fancy string instructions or decimal arithmetic unless
they were hand-coding.

This is 100% wrong. Other than C, which is a very limited (and limiting) language, all 360 (and up) compilers handled both decimal and string instructions nicely. COBOL, PL/I, and I suppose, RPG all used them. Even
in assembler I used them quite extensively.

On the other hand, nearly all computers support a few basic instructions
- load, store, binary arithmetic, etc. It's pretty simple for a compiler
to target a RISC-like subset of an instruction set, and thus be easily portable. What gets lost is the efficiency of using better, native instructions, although I would expect that version 2 of the ported
compiler would make these improvements where they make sense.

Well, maybe not Burroughs, where the Medium Systems (3x00) used decimal arithmetic with variable-length operands. I think even the instruction counter was decimal.

Also see the Singer System Ten.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lynn Wheeler@3:633/10 to All on Sun May 31 13:52:34 2026

John Levine <johnl@taugh.com> writes:

That was the lesson of the IBM 801. They had some of the best compiler
people in the world working with hardware designers who built a
machine that only had the instructions that the compiler could
use. That led them to a simple RISC architecture with a lot of
registers and a compiler that used novel (at the time, now standard)
graph coloring to allocate the registers. When they retargeted their
PL.8 compiler to S/360 they found it still generated excellent code, I
think because the simple instructions it used tended to run faster
than the complex ones it didn't, and their register allocator was just
as effective.

Early last decade, I got asked to track down decision to add virtual
memory to all 370. Bascially (os/360) MVT storage management was so bad
that REGION sizes frequently had to specified four times larger than
used. As result a typical 1mbyte, 370/165 would only run four concurrent regions, throughput insufficient to keep system busy and
justified. Going to 16mbyte virtual address space could increase number
of concurrent regions by factor of four (capped at 15 because of 4bit
storage protect key) with little or no paging (similar to running MVT in
CP67 16mbyte virtual machine). I had dropped by Ludlow doing the initial implementation, using 360/67 (pending 370 engineering system with
virtual memory). He was doing little bit of code to create virtual
memory tables and some simple paging. Biggest issue was EXCP/SVC0 was
now being passed channel programs with virtual addresses and channels
required real addresses (similar to CP67 running virtual machines), and
he borrows CP67 CCWTRANS integrated into

One of my hobbies after joining IBM was enhanced production operating
systems for internal datacenters (HONE, online branch office
sales&marketing support, was one of the 1st and long time
customers). With decision to add virtual memory to all 370s, also
including doing VM370. In transition of CP67->VM370, lots of stuff was simplified or dropped (including SMP support). I then start adding a lot
of stuff back into VM370R2-base, including kernel reorged needed for SMP support (but not full SMP). Then with VM370R3-base, I put lot more stuff
back in, including SMP support, originally for HONE so they could
upgrade their 158 & 168 systems to 2-CPU (getting twice throughput of
single CPU systems).

I then get sucked into helping with an effort to do 16-CPU 370 SMP
(shared memory multiprocessor) and we con the 3033 processor engineers
into helping in their spare time (a lot more interesting that remapping
370/168 logic to 20% faster chips). Everybody thought it was great until somebody tells head of POK (DSD, high-end systems), that it could be
decades before the POK favorite son operating system ("MVS") has
effective 16-CPU support (MVS docs were that 2-CPU systems were only
getting 1.2-1.5 times throughput of 1-CPU; POK doesn't ship 16-CPU
system until after turn of century).

1976, there is an "advanced technology" conference in POK where both
801/RISC and 16-processor is presented. One of the 801/RISC people gives
me a bad time claiming he had looked at the VM370 product code which had
no SMP support. I've observed that it was the last adtech conference
until sometime in the 80s (because so many adtech groups were being
thrown into the 370 development breach). I had joked that John came up
with 801/RISC to be the opposite of the complexity of "Future System".

Overlapping transition of 370 to virtual memory the 1st half of the 70s
was the "Future System" project, completely different than 370 and was
suppose to completely replace 370 (I continued to work on 360&370 all
during FS and would periodicall ridicule what they were doing). Internal politics was working on shutting down 370 activities and lack of more
new 370 during FS is credited with giving the clone 370 system makers (including Amdahl), their market foothold. When FS finally implodes,
there is mad rush getting new stuff into 370 product pipelines,
including kicking off quick&dirty 3033&3081 efforts in parallel.

Head of POK invites some of us to never visit POK again and directed the
3033 processor engineers, "heads down and no distractions"

Part of 801 presentation was PL.8 would only generate correct code and
the CP.r operating system would only execute correct PL.8 code. As a
result, 801 RISC didn't need hardware protection domains (things like
changing address spaces could be done with inline application code). 801
ROMP chip was originally for OPD Displaywriter follow-on. When
Displaywriter follow-on was canceled, they decided to pivot to the UNIX workstation market and hired the company that had done PC/IX (for
IBM/PC) to do AIX for the PC/RT workstation (but needed ROMP to support
UNIX paradigm hardware protection).

FS had a lot of object-like characteristics, however one of the last
nails in the FS coffin was analysis by IBM Houston Scientific Center
that 370/195 apps redone for a FS machine made with the fastest
technology available, would have throughput of 370/145 (about 30 times
slow down). FS disaster
http://www.jfsowa.com/computer/memo125.htm https://en.wikipedia.org/wiki/IBM_Future_Systems_project https://people.computing.clemson.edu/~mark/fs.html

... from "Computer Wars: The Post-IBM World" https://www.amazon.com/Computer-Wars-The-Post-IBM-World/dp/1587981394/

... and perhaps most damaging, the old culture under Watson Snr and Jr
of free and vigorous debate was replaced with *SYNCOPHANCY* and *MAKE NO
WAVES* under Opel and Akers. It's claimed that thereafter, IBM lived in
the shadow of defeat ... But because of the heavy investment of face by
the top management, F/S took years to kill, although its wrong
headedness was obvious from the very outset. "For the first time, during
F/S, outspoken criticism became politically dangerous," recalls a former
top executive

... snip ...

Decade after 16-CPU 370 effort, get project to do HA/6000, originally
for NYTimes to move their newspaper system (ATEX) off DEC VAXCluster to RS/6000. I rename it HA/CMP https://en.wikipedia.org/wiki/IBM_High_Availability_Cluster_Multiprocessing when I start doing technical/scientific cluster scale-up with national
labs (LANL, LLNL, NCAR, etc) and commercial cluster scale-up with RDBMS
vendors (Oracle, Sybase, Ingres, Informix) with VAXCluster support in
same source base with UNIX.

IBM S/88 (relogo'ed Stratus) Product Administrator started taking us
around to their customers and also had me write a section for the
corporate continuous availability document (it gets pulled when both AS400/Rochester and mainframe/POK complain they couldn't meet
requirements). Had coined "disaster survivability" and "geographic survivability" (as counter to disaster/recovery) when out marketing
HA/CMP. One of the visits to 1-800 bellcore development showed that S/88
would use a century of downtime in one software upgrade, while HA/CMP
had a couple extra "nines" (compared to S/88).

One of the first HA/CMP customer installs was new Indian Reservation
Casino in Connecticut, was suppose to have week of testing before
opening ... but after 24hrs, they decided to open the doors (based on
projected revenue; at the time was largest in the US, still one of the
largest in the country) https://en.wikipedia.org/wiki/Foxwoods_Resort_Casino#Debt_default

Early Jan92, there was HA/CMP meeting with Oracle CEO and IBM/AWD
executive Hester tells Ellison that we would have 16-system clusters by
mid92 and 128-system clusters by ye92. Mid-jan92, I update FSD on HA/CMP
work with national labs and FSD decides to go with HA/CMP for federal supercomputers. By end of Jan, we are told that cluster scale-up is
being transferred to Kingston for announce as IBM Supercomputer (technical/scientific *ONLY*) and we aren't allowed to work with
anything that has more than four systems (we leave IBM a few months
later). A couple weeks later, 17feb1992, Computerworld news ... IBM
establishes laboratory to develop parallel systems (pg8) https://archive.org/details/sim_computerworld_1992-02-17_26_7

Some speculation that HA/CMP would have eaten the mainframe in the
commercial market. 1993 industry benchmarks (number of program
iterations compared to the industry MIPS/BIPS reference platform):

ES/9000-982 : 8CPU 408MIPS, (51MIPS/CPU)
RS6000/990 (RIOS chipset) : 1-CPU: 126MIPS, 16-systems: 2BIPS,
128-systems: 16BIPS

Executive we had reported to, goes over to head up Somerset/AIM (Apple,
IBM, Motorola) to do single chip 801/RISC (Power/PC) and uses Motorola
88k bus/cache enabling SMP implementations.=

--
virtualization experience starting Jan1968, online at home since Mar1970

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lynn Wheeler@3:633/10 to All on Sun May 31 14:41:49 2026

... trivia: after FS implodes, head of POK was convincing corporate to
kill the VM370 product, shutdown the development group and transfer all
the people to POK for (370/XA) MVS/XA ... possibly because of how bad it
made POK's favorite son operation system, MVS, look; ... which 16-CPU
SMP would have just made MVS look worse.

Endicott (370 mid-range) eventually manages to acquire the VM370 product mission ... but has to recreate a development group from scratch.

--
virtualization experience starting Jan1968, online at home since Mar1970

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From John Levine@3:633/10 to All on Mon Jun 1 01:17:31 2026

According to Peter Flass <Peter@Iron-Spring.com>:

On 5/31/26 00:03, Lev wrote:

John Levine wrote:

The PL.8 retargeting result is striking - the fact that the compiler
designed for 801's simple instructions also produced good S/360 code
suggests the problem wasn't that CISC was bad, but that CISC
instructions compilers couldn't easily select were dead weight.
Nobody was emitting the fancy string instructions or decimal
arithmetic unless they were hand-coding.

This is 100% wrong. Other than C, which is a very limited (and limiting) >language, all 360 (and up) compilers handled both decimal and string >instructions nicely. COBOL, PL/I, and I suppose, RPG all used them. Even
in assembler I used them quite extensively. ...

Take a look at this paper from 25 years ago, the part on page 52 about System/370. Even though the PL.8 compiler didn't use all the
instructions, its code ran much faster than the regular PL/I compiler
due to the better register management and using a fast subset of the instruction set.

https://acg.cis.upenn.edu/milom/cis501-Fall11/papers/cocke-RISC.pdf

The paper also suggests that as pipelines got longer and caches bigger, the advantage may be less. Also, compilers now all use the graph coloring
register allocator that PL.8 introduced.

There have certainly been places where the CISC stuff makes sense. If
you were running RPG on an 8K machine, code size was really important
and it wasn't hard to keep up with a card reader and a printer.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lev@3:633/10 to All on Mon Jun 1 07:03:59 2026

Peter Flass wrote:

This is 100% wrong. Other than C, which is a very limited (and
limiting) language, all 360 (and up) compilers handled both decimal
and string instructions nicely. COBOL, PL/I, and I suppose, RPG
all used them.

You're right, I overstated it badly. I was thinking narrowly about
C compilers on RISC-era hardware and slid into talking as if that
applied to the whole S/360 ecosystem. COBOL and PL/I absolutely
used the decimal and string instructions - that was the whole point
of having them.

The better claim, which is what Levine's PL.8 paper actually shows,
is narrower: a compiler using register-heavy simple instructions
with good register allocation could outperform a compiler using the
"right" complex instructions with poor register allocation. The
win was in the register allocator, not in avoiding CISC per se.

Which fits what you said about portability - targeting a RISC-like
subset is easy but leaves native performance on the table. PL.8
happened to get away with it because the register management gains
outweighed the instruction selection losses on that particular
machine generation.

Scott: the Burroughs Medium Systems with BCD everything is wild.
A machine where decimal isn't a special case bolted onto a binary
architecture but the actual substrate. Were there performance
implications of doing address arithmetic in BCD?

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lynn Wheeler@3:633/10 to All on Mon Jun 1 05:08:06 2026

25oct2006 comp.arch/a.f.c post with archived 08aug81 email pascal
"benchmark" including pascal w/pl.8 backend

6m 30 secs PERQ (with PERQ's Pascal compiler, of course)
4m 55 secs 68000 with PASCAL/PL.8 compiler at OPT 2
0m 21.5 secs 3033 PASCAL/VS with Optimization
0m 10.5 secs 3033 with PASCAL/PL.8 at OPT 0
0m 5.9 secs 3033 with PASCAL/PL.8 at OPT 3

--
virtualization experience starting Jan1968, online at home since Mar1970

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

Who's Online
Recent Visitors
- RufusT
  Tue May 19 12:56:22 2026
  from Dallas, TX via RLogin
- Guest
  Tue May 26 22:56:58 2026
  from Melbourne via Telnet
- Guest
  Sun Jun 14 06:31:07 2026
  from Ny via Telnet
- Frostydev
  Mon Jul 13 05:56:46 2026
  from Maryland via Raw

System Info

Sysop:	Tetrazocine
Location:	Melbourne, VIC, Australia
Users:	12
Nodes:	8 (0 / 8)
Uptime:	114:16:26
Calls:	220
Files:	21,513
Messages:	83,022

Just How Bad Was The Intel IAPX432?

Who's Online

Recent Visitors

System Info