• Just How Bad Was The Intel IAPX432?

    From Peter Flass@3:633/10 to All on Mon May 25 07:44:43 2026
    https://hackaday.com/2026/05/25/just-how-bad-was-the-intel-iapx432/

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lynn Wheeler@3:633/10 to All on Mon May 25 07:54:10 2026
    Peter Flass <Peter@Iron-Spring.com> writes:
    https://hackaday.com/2026/05/25/just-how-bad-was-the-intel-iapx432/

    432 group gave a talk at asilomar acm sigops meeting ... major problem I remember they talked about was putting sophisticated operating system
    functions in silicon and problems/enhancements required new/replacement
    chips.

    I had recently done something similar for entry IBM 370 ... but it was microcode ... scheduling/dispatching for five CPU SMP, I/O drivers,
    etc. ... so I could sympathize.

    --
    virtualization experience starting Jan1968, online at home since Mar1970

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Mon May 25 21:04:40 2026
    The ever-dependable RetroBytes channel did a post-mortem a few years
    ago <https://www.youtube.com/watch?v=4o4MXV-d-jQ>.

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Peter Flass@3:633/10 to All on Mon May 25 15:39:58 2026
    On 5/25/26 14:04, Lawrence D?Oliveiro wrote:
    The ever-dependable RetroBytes channel did a post-mortem a few years
    ago <https://www.youtube.com/watch?v=4o4MXV-d-jQ>.

    My favorite misfeature was using bit-addressing instead of byte
    addressing. In one swell foop the segments could have been eight times
    bigger, at the cost of a few bytes. I also assume it was harder to
    decode the instructions, or at least used more logic that could have
    been put to better use elsewhere.

    Too bad the software for it doesn't exist, or didn't originally.

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Tue May 26 00:38:09 2026
    On Mon, 25 May 2026 15:39:58 -0700, Peter Flass wrote:

    My favorite misfeature was using bit-addressing instead of byte
    addressing.

    Now that 64-bit architectures are commonplace, I wonder why we can?t
    have bit addressing instead of byte addressing. It would only cost
    3 bits at the bottom of the address, and we have plenty to spare.

    For performance, not every instruction would need to support
    bit-aligned memory accesses -- regular loads/stores could either be
    defined to demand those bits be zero, or to ignore them. You would
    need special bit-aligned load/store instructions to take advantage of
    them.

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Peter Flass@3:633/10 to All on Mon May 25 20:41:12 2026
    On 5/25/26 17:38, Lawrence D?Oliveiro wrote:
    On Mon, 25 May 2026 15:39:58 -0700, Peter Flass wrote:

    My favorite misfeature was using bit-addressing instead of byte
    addressing.

    Now that 64-bit architectures are commonplace, I wonder why we can?t
    have bit addressing instead of byte addressing. It would only cost
    3 bits at the bottom of the address, and we have plenty to spare.

    For performance, not every instruction would need to support
    bit-aligned memory accesses -- regular loads/stores could either be
    defined to demand those bits be zero, or to ignore them. You would
    need special bit-aligned load/store instructions to take advantage of
    them.

    Like the Sigma 7. Load Byte, Load Halfword, and Load Word used Byte,
    halfword, and word addressing respectively (IIRC).

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Rich Alderson@3:633/10 to All on Tue May 26 16:16:37 2026
    Lawrence =?iso-8859-13?q?D=FFOliveiro?= <ldo@nz.invalid> writes:

    On Mon, 25 May 2026 15:39:58 -0700, Peter Flass wrote:

    My favorite misfeature was using bit-addressing instead of byte addressing.

    Now that 64-bit architectures are commonplace, I wonder why we can't have bit addressing instead of byte addressing. It would only cost 3 bits at the bottom of the address, and we have plenty to spare.

    For performance, not every instruction would need to support bit-aligned memory accesses -- regular loads/stores could either be defined to demand those bits be zero, or to ignore them. You would need special bit-aligned load/store instructions to take advantage of them.

    Congratulations.

    You have just re-invented PDP-6 byte pointers.

    From 1964.

    --
    Rich Alderson news@alderson.users.panix.com
    Audendum est, et veritas investiganda; quam etiamsi non assequamur,
    omnino tamen proprius, quam nunc sumus, ad eam perveniemus.
    --Galen

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Scott Lurndal@3:633/10 to All on Wed May 27 18:59:52 2026
    Rich Alderson <news@alderson.users.panix.com> writes:
    Lawrence =?iso-8859-13?q?D=FFOliveiro?= <ldo@nz.invalid> writes:

    On Mon, 25 May 2026 15:39:58 -0700, Peter Flass wrote:

    My favorite misfeature was using bit-addressing instead of byte addressing.

    Now that 64-bit architectures are commonplace, I wonder why we can't have bit
    addressing instead of byte addressing. It would only cost 3 bits at the
    bottom of the address, and we have plenty to spare.

    Plenty to spare? Not really. CXL and other technologies have made
    even a 64-bit address space limiting.

    Wasting three bits of the address for bit addressing, which outside
    of specialized applications is not useful, would be silly.


    For performance, not every instruction would need to support bit-aligned
    memory accesses -- regular loads/stores could either be defined to demand
    those bits be zero, or to ignore them. You would need special bit-aligned
    load/store instructions to take advantage of them.

    Congratulations.

    You have just re-invented PDP-6 byte pointers.

    From 1964.


    Which proved to be an evolutionary dead-end.

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From John Ames@3:633/10 to All on Wed May 27 12:11:25 2026
    On Wed, 27 May 2026 18:59:52 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    Plenty to spare? Not really. CXL and other technologies have made
    even a 64-bit address space limiting.

    ...I'm mildly curious in which applications an address space of 16 EB
    would be considered "limiting" o_O


    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Peter Flass@3:633/10 to All on Wed May 27 13:06:31 2026
    On 5/27/26 11:59, Scott Lurndal wrote:
    Rich Alderson <news@alderson.users.panix.com> writes:
    Lawrence =?iso-8859-13?q?D=FFOliveiro?= <ldo@nz.invalid> writes:

    On Mon, 25 May 2026 15:39:58 -0700, Peter Flass wrote:

    My favorite misfeature was using bit-addressing instead of byte addressing.

    Now that 64-bit architectures are commonplace, I wonder why we can't have bit
    addressing instead of byte addressing. It would only cost 3 bits at the
    bottom of the address, and we have plenty to spare.

    Plenty to spare? Not really. CXL and other technologies have made
    even a 64-bit address space limiting.

    Wasting three bits of the address for bit addressing, which outside
    of specialized applications is not useful, would be silly.


    I'd be happy to see instructions that used bit pointers. In most cases
    RISC is fine, but working with unaligned bit strings, for example
    BITBLT, is just horrible. There's so much shifting and masking that
    would be much more efficient at the hardware level.


    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Wed May 27 22:34:45 2026
    On Wed, 27 May 2026 13:06:31 -0700, Peter Flass wrote:

    I'd be happy to see instructions that used bit pointers. In most
    cases RISC is fine, but working with unaligned bit strings, for
    example BITBLT, is just horrible. There's so much shifting and
    masking that would be much more efficient at the hardware level.

    I think there?s a feedback effect here: the C language (in which most
    system software is written) makes it difficult to use unaligned
    bitfields, particularly dynamic ones, so compilers have few
    opportunities to generate those instructions. And CPU architecture
    designers see that these instructions are not used much, and conclude
    that they?re not very important.

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Scott Lurndal@3:633/10 to All on Wed May 27 22:44:29 2026
    John Ames <commodorejohn@gmail.com> writes:
    On Wed, 27 May 2026 18:59:52 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    Plenty to spare? Not really. CXL and other technologies have made
    even a 64-bit address space limiting.

    ...I'm mildly curious in which applications an address space of 16 EB
    would be considered "limiting" o_O


    Question: Why are IPV6 addresses 128 bits?

    Answer: A sparse address space is useful.

    The address space addresses more than just DRAM (e.g. PCI devices),
    and there are often alignment issues to be considered (e.g.
    a hypervisor may align things on 1GB (30-bit) boundaries to reduce
    TLB pressure for virtualization).

    4TB uses 42 bits. A CXL system with 1024 4TB nodes uses 10 bits
    for node Id.

    That's only 12 left-over bits, and both memory and cluster size
    can easily expand to consume those in the very near future.

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Scott Lurndal@3:633/10 to All on Wed May 27 22:47:59 2026
    Peter Flass <Peter@Iron-Spring.com> writes:
    On 5/27/26 11:59, Scott Lurndal wrote:
    Rich Alderson <news@alderson.users.panix.com> writes:
    Lawrence =?iso-8859-13?q?D=FFOliveiro?= <ldo@nz.invalid> writes:

    On Mon, 25 May 2026 15:39:58 -0700, Peter Flass wrote:

    My favorite misfeature was using bit-addressing instead of byte addressing.

    Now that 64-bit architectures are commonplace, I wonder why we can't have bit
    addressing instead of byte addressing. It would only cost 3 bits at the >>>> bottom of the address, and we have plenty to spare.

    Plenty to spare? Not really. CXL and other technologies have made
    even a 64-bit address space limiting.

    Wasting three bits of the address for bit addressing, which outside
    of specialized applications is not useful, would be silly.


    I'd be happy to see instructions that used bit pointers. In most cases
    RISC is fine, but working with unaligned bit strings, for example
    BITBLT, is just horrible. There's so much shifting and masking that
    would be much more efficient at the hardware level.

    That's a clear corner case. And not worth dealing with the PDP-6
    style byte accesses.

    For the most part, programmers abstract the operations:

    template<class T> static inline T extract(T input, size_t stop_bit, size_t start_bit)
    {
    input >>= start_bit;
    input &= maskT<T>(stop_bit - start_bit + 1);
    return input;
    }

    uint64_t bits16_5 = bit::extract(value, 16, 5);

    Pretty clear and let the compiler generate the appropriate
    masking (or in many architectures bit-extract) instructions.

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lev@3:633/10 to All on Sat May 30 07:13:16 2026
    Peter Flass <Peter@Iron-Spring.com> wrote:
    https://hackaday.com/2026/05/25/just-how-bad-was-the-intel-iapx432/

    The benchmark result is the interesting part. The 432 beat an 8086
    at the same clock speed doing the same algorithm in hand-written
    code. That's not what you'd expect from a chip everyone agrees
    was a disaster.

    Mark's speculation that the problem was compiler optimization rather
    than hardware design is worth taking seriously. The 432 had over 200
    operators, built-in object-oriented programming, capability-based
    addressing - all of which are nightmares for a compiler writer in
    1981. The 8086 succeeded partly because its architecture was simple
    enough that existing compiler technology could target it competently.

    The pattern repeats with Itanium: a chip designed around the idea
    that compilers could do instruction scheduling better than hardware,
    which turned out to be true in theory and catastrophically wrong in
    practice, because writing those compilers was harder than anyone
    anticipated.

    Both cases suggest that processor design has a social component.
    It's not enough for hardware to be capable in principle. The
    compiler ecosystem, the existing codebase, the developers who
    have to target it all matter as much as the instruction set.
    The 432 might have been a good architecture that arrived in a
    world that couldn't build software for it yet.

    Rich Alderson's point about PDP-6 byte pointers is apt too.
    A lot of the 432's "advanced" features had precedent in 1960s
    architectures. What was new was cramming all of them into one
    chip at once.

    Lev

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Peter Flass@3:633/10 to All on Sat May 30 08:07:34 2026
    On 5/30/26 00:13, Lev wrote:
    Peter Flass <Peter@Iron-Spring.com> wrote:
    https://hackaday.com/2026/05/25/just-how-bad-was-the-intel-iapx432/

    The benchmark result is the interesting part. The 432 beat an 8086
    at the same clock speed doing the same algorithm in hand-written
    code. That's not what you'd expect from a chip everyone agrees
    was a disaster.

    Mark's speculation that the problem was compiler optimization rather
    than hardware design is worth taking seriously. The 432 had over 200 operators, built-in object-oriented programming, capability-based
    addressing - all of which are nightmares for a compiler writer in
    1981. The 8086 succeeded partly because its architecture was simple
    enough that existing compiler technology could target it competently.

    This is the general consensus. [I think I have this right, but it's at
    least approximately right] The Ada compiler originally put every
    subroutine (whatever they're called in Ada, procedure, function?) into a separate segment, so there was a context switch on every call. Intel was working on it, and improved the performance a lot, but by that time the
    damage was done.

    The Multics people had a similar problem with the original Digitek
    compiler. They had to throw it out and write a new one to get it working acceptably.


    The pattern repeats with Itanium: a chip designed around the idea
    that compilers could do instruction scheduling better than hardware,
    which turned out to be true in theory and catastrophically wrong in
    practice, because writing those compilers was harder than anyone
    anticipated.

    Both cases suggest that processor design has a social component.
    It's not enough for hardware to be capable in principle. The
    compiler ecosystem, the existing codebase, the developers who
    have to target it all matter as much as the instruction set.
    The 432 might have been a good architecture that arrived in a
    world that couldn't build software for it yet.

    This is an excellent point.


    Rich Alderson's point about PDP-6 byte pointers is apt too.
    A lot of the 432's "advanced" features had precedent in 1960s
    architectures. What was new was cramming all of them into one
    chip at once.

    Lev


    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From John Levine@3:633/10 to All on Sat May 30 19:24:12 2026
    According to Peter Flass <Peter@Iron-Spring.com>:
    addressing - all of which are nightmares for a compiler writer in
    1981. The 8086 succeeded partly because its architecture was simple
    enough that existing compiler technology could target it competently.

    This is the general consensus. [I think I have this right, but it's at
    least approximately right] ...

    I worked on a lot of PC software in the 1980s and I agree. We had C compilers that generated pretty good code. We basically punted on the segment stuff via medium model code. The whole program shared the same data segment. Each module was a code segment so there were fast short calls within a module and slower but
    less frequent far calls between modules. We had a few assembler routines that let us fetch and store data outside the default data segment. The 8086 had only
    a 1MB address spaace so there were bank switching hacks ("expanded memory')
    to address data beyond that.

    Both cases suggest that processor design has a social component.
    It's not enough for hardware to be capable in principle. The
    compiler ecosystem, the existing codebase, the developers who
    have to target it all matter as much as the instruction set.
    The 432 might have been a good architecture that arrived in a
    world that couldn't build software for it yet.

    That was the lesson of the IBM 801. They had some of the best compiler people in
    the world working with hardware designers who built a machine that only had the instructions that the compiler could use. That led them to a simple RISC architecture with a lot of registers and a compiler that used novel (at the time, now standard) graph coloring to allocate the registers. When they retargeted their PL.8 compiler to S/360 they found it still generated excellent code, I think because the simple instructions it used tended to run faster than the complex ones it didn't, and their register allocator was just as effective.

    Rich Alderson's point about PDP-6 byte pointers is apt too.
    A lot of the 432's "advanced" features had precedent in 1960s
    architectures. What was new was cramming all of them into one
    chip at once.

    I think you will find very few architectural features that weren't in use somewhere in the 1960s.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lev@3:633/10 to All on Sun May 31 07:03:55 2026
    John Levine wrote:
    I worked on a lot of PC software in the 1980s and I agree. We had C
    compilers that generated pretty good code. We basically punted on the
    segment stuff via medium model code.

    This is the part that interests me most. The 8086 won partly because
    you could ignore its worst features. Medium model let you pretend
    segments weren't there for most purposes. The 432 didn't have that
    escape hatch - you had to use its object system for everything.

    That was the lesson of the IBM 801. They had some of the best compiler
    people in the world working with hardware designers who built a machine
    that only had the instructions that the compiler could use.

    The 801 story is a good counterexample to the 432 in both directions.
    Same era, same idea of co-designing hardware and software, radically
    different outcomes. The 801 team simplified toward what compilers
    could actually do. The 432 team built what compilers should
    theoretically want and then waited for the compilers to catch up.

    The PL.8 retargeting result is striking - the fact that the compiler
    designed for 801's simple instructions also produced good S/360 code
    suggests the problem wasn't that CISC was bad, but that CISC
    instructions compilers couldn't easily select were dead weight.
    Nobody was emitting the fancy string instructions or decimal
    arithmetic unless they were hand-coding.

    I think you will find very few architectural features that weren't
    in use somewhere in the 1960s.

    Fair point. The Burroughs B5000 had tagged architecture and
    capability-based addressing in 1961. The 432 was less innovative
    than Intel's marketing suggested. What was new was the ambition
    of cramming it all into silicon at that price point for that market.

    Lev

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Peter Flass@3:633/10 to All on Sun May 31 07:57:23 2026
    On 5/31/26 00:03, Lev wrote:
    John Levine wrote:

    The PL.8 retargeting result is striking - the fact that the compiler
    designed for 801's simple instructions also produced good S/360 code
    suggests the problem wasn't that CISC was bad, but that CISC
    instructions compilers couldn't easily select were dead weight.
    Nobody was emitting the fancy string instructions or decimal
    arithmetic unless they were hand-coding.


    This is 100% wrong. Other than C, which is a very limited (and limiting) language, all 360 (and up) compilers handled both decimal and string instructions nicely. COBOL, PL/I, and I suppose, RPG all used them. Even
    in assembler I used them quite extensively.

    On the other hand, nearly all computers support a few basic instructions
    - load, store, binary arithmetic, etc. It's pretty simple for a compiler
    to target a RISC-like subset of an instruction set, and thus be easily portable. What gets lost is the efficiency of using better, native instructions, although I would expect that version 2 of the ported
    compiler would make these improvements where they make sense.

    Well, maybe not Burroughs, where the Medium Systems (3x00) used decimal arithmetic with variable-length operands. I think even the instruction
    counter was decimal.


    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Scott Lurndal@3:633/10 to All on Sun May 31 17:02:37 2026
    Peter Flass <Peter@Iron-Spring.com> writes:
    On 5/31/26 00:03, Lev wrote:
    John Levine wrote:

    The PL.8 retargeting result is striking - the fact that the compiler
    designed for 801's simple instructions also produced good S/360 code
    suggests the problem wasn't that CISC was bad, but that CISC
    instructions compilers couldn't easily select were dead weight.
    Nobody was emitting the fancy string instructions or decimal
    arithmetic unless they were hand-coding.


    This is 100% wrong. Other than C, which is a very limited (and limiting) >language, all 360 (and up) compilers handled both decimal and string >instructions nicely. COBOL, PL/I, and I suppose, RPG all used them. Even
    in assembler I used them quite extensively.

    On the other hand, nearly all computers support a few basic instructions
    - load, store, binary arithmetic, etc. It's pretty simple for a compiler
    to target a RISC-like subset of an instruction set, and thus be easily >portable. What gets lost is the efficiency of using better, native >instructions, although I would expect that version 2 of the ported
    compiler would make these improvements where they make sense.

    Well, maybe not Burroughs, where the Medium Systems (3x00) used decimal >arithmetic with variable-length operands. I think even the instruction >counter was decimal.

    Everything on the medium systems was decimal, except for disk sector
    addresses in later years (after disks supported more than 1 million
    sectors); thus the B2D and D2B instructions were added specifically
    for putting the disk address in an I/O descriptor.

    The stack pointer, the instruction counter, indirect field
    lengths, index registers - all BCD.

    Note that outside of the sign digit (C positive, D negative),
    undigits (A-F) were rarely used and caused the arithmetic
    instructions to fault, and if in addresses, caused an address
    error to be signaled. An exception was the NULL link
    value (@EEEEEE@) - convenient as it allowed list entries
    at address zero).

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bob Eager@3:633/10 to All on Sun May 31 17:29:19 2026
    On Sun, 31 May 2026 07:57:23 -0700, Peter Flass wrote:

    On 5/31/26 00:03, Lev wrote:
    John Levine wrote:

    The PL.8 retargeting result is striking - the fact that the compiler
    designed for 801's simple instructions also produced good S/360 code
    suggests the problem wasn't that CISC was bad, but that CISC
    instructions compilers couldn't easily select were dead weight. Nobody
    was emitting the fancy string instructions or decimal arithmetic unless
    they were hand-coding.


    This is 100% wrong. Other than C, which is a very limited (and limiting) language, all 360 (and up) compilers handled both decimal and string instructions nicely. COBOL, PL/I, and I suppose, RPG all used them. Even
    in assembler I used them quite extensively.

    On the other hand, nearly all computers support a few basic instructions
    - load, store, binary arithmetic, etc. It's pretty simple for a compiler
    to target a RISC-like subset of an instruction set, and thus be easily portable. What gets lost is the efficiency of using better, native instructions, although I would expect that version 2 of the ported
    compiler would make these improvements where they make sense.

    Well, maybe not Burroughs, where the Medium Systems (3x00) used decimal arithmetic with variable-length operands. I think even the instruction counter was decimal.

    Also see the Singer System Ten.

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lynn Wheeler@3:633/10 to All on Sun May 31 13:52:34 2026
    John Levine <johnl@taugh.com> writes:
    That was the lesson of the IBM 801. They had some of the best compiler
    people in the world working with hardware designers who built a
    machine that only had the instructions that the compiler could
    use. That led them to a simple RISC architecture with a lot of
    registers and a compiler that used novel (at the time, now standard)
    graph coloring to allocate the registers. When they retargeted their
    PL.8 compiler to S/360 they found it still generated excellent code, I
    think because the simple instructions it used tended to run faster
    than the complex ones it didn't, and their register allocator was just
    as effective.


    Early last decade, I got asked to track down decision to add virtual
    memory to all 370. Bascially (os/360) MVT storage management was so bad
    that REGION sizes frequently had to specified four times larger than
    used. As result a typical 1mbyte, 370/165 would only run four concurrent regions, throughput insufficient to keep system busy and
    justified. Going to 16mbyte virtual address space could increase number
    of concurrent regions by factor of four (capped at 15 because of 4bit
    storage protect key) with little or no paging (similar to running MVT in
    CP67 16mbyte virtual machine). I had dropped by Ludlow doing the initial implementation, using 360/67 (pending 370 engineering system with
    virtual memory). He was doing little bit of code to create virtual
    memory tables and some simple paging. Biggest issue was EXCP/SVC0 was
    now being passed channel programs with virtual addresses and channels
    required real addresses (similar to CP67 running virtual machines), and
    he borrows CP67 CCWTRANS integrated into

    One of my hobbies after joining IBM was enhanced production operating
    systems for internal datacenters (HONE, online branch office
    sales&marketing support, was one of the 1st and long time
    customers). With decision to add virtual memory to all 370s, also
    including doing VM370. In transition of CP67->VM370, lots of stuff was simplified or dropped (including SMP support). I then start adding a lot
    of stuff back into VM370R2-base, including kernel reorged needed for SMP support (but not full SMP). Then with VM370R3-base, I put lot more stuff
    back in, including SMP support, originally for HONE so they could
    upgrade their 158 & 168 systems to 2-CPU (getting twice throughput of
    single CPU systems).

    I then get sucked into helping with an effort to do 16-CPU 370 SMP
    (shared memory multiprocessor) and we con the 3033 processor engineers
    into helping in their spare time (a lot more interesting that remapping
    370/168 logic to 20% faster chips). Everybody thought it was great until somebody tells head of POK (DSD, high-end systems), that it could be
    decades before the POK favorite son operating system ("MVS") has
    effective 16-CPU support (MVS docs were that 2-CPU systems were only
    getting 1.2-1.5 times throughput of 1-CPU; POK doesn't ship 16-CPU
    system until after turn of century).

    1976, there is an "advanced technology" conference in POK where both
    801/RISC and 16-processor is presented. One of the 801/RISC people gives
    me a bad time claiming he had looked at the VM370 product code which had
    no SMP support. I've observed that it was the last adtech conference
    until sometime in the 80s (because so many adtech groups were being
    thrown into the 370 development breach). I had joked that John came up
    with 801/RISC to be the opposite of the complexity of "Future System".

    Overlapping transition of 370 to virtual memory the 1st half of the 70s
    was the "Future System" project, completely different than 370 and was
    suppose to completely replace 370 (I continued to work on 360&370 all
    during FS and would periodicall ridicule what they were doing). Internal politics was working on shutting down 370 activities and lack of more
    new 370 during FS is credited with giving the clone 370 system makers (including Amdahl), their market foothold. When FS finally implodes,
    there is mad rush getting new stuff into 370 product pipelines,
    including kicking off quick&dirty 3033&3081 efforts in parallel.

    Head of POK invites some of us to never visit POK again and directed the
    3033 processor engineers, "heads down and no distractions"

    Part of 801 presentation was PL.8 would only generate correct code and
    the CP.r operating system would only execute correct PL.8 code. As a
    result, 801 RISC didn't need hardware protection domains (things like
    changing address spaces could be done with inline application code). 801
    ROMP chip was originally for OPD Displaywriter follow-on. When
    Displaywriter follow-on was canceled, they decided to pivot to the UNIX workstation market and hired the company that had done PC/IX (for
    IBM/PC) to do AIX for the PC/RT workstation (but needed ROMP to support
    UNIX paradigm hardware protection).

    FS had a lot of object-like characteristics, however one of the last
    nails in the FS coffin was analysis by IBM Houston Scientific Center
    that 370/195 apps redone for a FS machine made with the fastest
    technology available, would have throughput of 370/145 (about 30 times
    slow down). FS disaster
    http://www.jfsowa.com/computer/memo125.htm https://en.wikipedia.org/wiki/IBM_Future_Systems_project https://people.computing.clemson.edu/~mark/fs.html

    ... from "Computer Wars: The Post-IBM World" https://www.amazon.com/Computer-Wars-The-Post-IBM-World/dp/1587981394/

    ... and perhaps most damaging, the old culture under Watson Snr and Jr
    of free and vigorous debate was replaced with *SYNCOPHANCY* and *MAKE NO
    WAVES* under Opel and Akers. It's claimed that thereafter, IBM lived in
    the shadow of defeat ... But because of the heavy investment of face by
    the top management, F/S took years to kill, although its wrong
    headedness was obvious from the very outset. "For the first time, during
    F/S, outspoken criticism became politically dangerous," recalls a former
    top executive

    ... snip ...

    Decade after 16-CPU 370 effort, get project to do HA/6000, originally
    for NYTimes to move their newspaper system (ATEX) off DEC VAXCluster to RS/6000. I rename it HA/CMP https://en.wikipedia.org/wiki/IBM_High_Availability_Cluster_Multiprocessing when I start doing technical/scientific cluster scale-up with national
    labs (LANL, LLNL, NCAR, etc) and commercial cluster scale-up with RDBMS
    vendors (Oracle, Sybase, Ingres, Informix) with VAXCluster support in
    same source base with UNIX.

    IBM S/88 (relogo'ed Stratus) Product Administrator started taking us
    around to their customers and also had me write a section for the
    corporate continuous availability document (it gets pulled when both AS400/Rochester and mainframe/POK complain they couldn't meet
    requirements). Had coined "disaster survivability" and "geographic survivability" (as counter to disaster/recovery) when out marketing
    HA/CMP. One of the visits to 1-800 bellcore development showed that S/88
    would use a century of downtime in one software upgrade, while HA/CMP
    had a couple extra "nines" (compared to S/88).

    One of the first HA/CMP customer installs was new Indian Reservation
    Casino in Connecticut, was suppose to have week of testing before
    opening ... but after 24hrs, they decided to open the doors (based on
    projected revenue; at the time was largest in the US, still one of the
    largest in the country) https://en.wikipedia.org/wiki/Foxwoods_Resort_Casino#Debt_default

    Early Jan92, there was HA/CMP meeting with Oracle CEO and IBM/AWD
    executive Hester tells Ellison that we would have 16-system clusters by
    mid92 and 128-system clusters by ye92. Mid-jan92, I update FSD on HA/CMP
    work with national labs and FSD decides to go with HA/CMP for federal supercomputers. By end of Jan, we are told that cluster scale-up is
    being transferred to Kingston for announce as IBM Supercomputer (technical/scientific *ONLY*) and we aren't allowed to work with
    anything that has more than four systems (we leave IBM a few months
    later). A couple weeks later, 17feb1992, Computerworld news ... IBM
    establishes laboratory to develop parallel systems (pg8) https://archive.org/details/sim_computerworld_1992-02-17_26_7

    Some speculation that HA/CMP would have eaten the mainframe in the
    commercial market. 1993 industry benchmarks (number of program
    iterations compared to the industry MIPS/BIPS reference platform):

    ES/9000-982 : 8CPU 408MIPS, (51MIPS/CPU)
    RS6000/990 (RIOS chipset) : 1-CPU: 126MIPS, 16-systems: 2BIPS,
    128-systems: 16BIPS

    Executive we had reported to, goes over to head up Somerset/AIM (Apple,
    IBM, Motorola) to do single chip 801/RISC (Power/PC) and uses Motorola
    88k bus/cache enabling SMP implementations.=

    --
    virtualization experience starting Jan1968, online at home since Mar1970

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lynn Wheeler@3:633/10 to All on Sun May 31 14:41:49 2026

    ... trivia: after FS implodes, head of POK was convincing corporate to
    kill the VM370 product, shutdown the development group and transfer all
    the people to POK for (370/XA) MVS/XA ... possibly because of how bad it
    made POK's favorite son operation system, MVS, look; ... which 16-CPU
    SMP would have just made MVS look worse.

    Endicott (370 mid-range) eventually manages to acquire the VM370 product mission ... but has to recreate a development group from scratch.

    --
    virtualization experience starting Jan1968, online at home since Mar1970

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From John Levine@3:633/10 to All on Mon Jun 1 01:17:31 2026
    According to Peter Flass <Peter@Iron-Spring.com>:
    On 5/31/26 00:03, Lev wrote:
    John Levine wrote:

    The PL.8 retargeting result is striking - the fact that the compiler
    designed for 801's simple instructions also produced good S/360 code
    suggests the problem wasn't that CISC was bad, but that CISC
    instructions compilers couldn't easily select were dead weight.
    Nobody was emitting the fancy string instructions or decimal
    arithmetic unless they were hand-coding.


    This is 100% wrong. Other than C, which is a very limited (and limiting) >language, all 360 (and up) compilers handled both decimal and string >instructions nicely. COBOL, PL/I, and I suppose, RPG all used them. Even
    in assembler I used them quite extensively. ...

    Take a look at this paper from 25 years ago, the part on page 52 about System/370. Even though the PL.8 compiler didn't use all the
    instructions, its code ran much faster than the regular PL/I compiler
    due to the better register management and using a fast subset of the instruction set.

    https://acg.cis.upenn.edu/milom/cis501-Fall11/papers/cocke-RISC.pdf

    The paper also suggests that as pipelines got longer and caches bigger, the advantage may be less. Also, compilers now all use the graph coloring
    register allocator that PL.8 introduced.

    There have certainly been places where the CISC stuff makes sense. If
    you were running RPG on an 8K machine, code size was really important
    and it wasn't hard to keep up with a card reader and a printer.



    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lev@3:633/10 to All on Mon Jun 1 07:03:59 2026
    Peter Flass wrote:

    This is 100% wrong. Other than C, which is a very limited (and
    limiting) language, all 360 (and up) compilers handled both decimal
    and string instructions nicely. COBOL, PL/I, and I suppose, RPG
    all used them.

    You're right, I overstated it badly. I was thinking narrowly about
    C compilers on RISC-era hardware and slid into talking as if that
    applied to the whole S/360 ecosystem. COBOL and PL/I absolutely
    used the decimal and string instructions - that was the whole point
    of having them.

    The better claim, which is what Levine's PL.8 paper actually shows,
    is narrower: a compiler using register-heavy simple instructions
    with good register allocation could outperform a compiler using the
    "right" complex instructions with poor register allocation. The
    win was in the register allocator, not in avoiding CISC per se.

    Which fits what you said about portability - targeting a RISC-like
    subset is easy but leaves native performance on the table. PL.8
    happened to get away with it because the register management gains
    outweighed the instruction selection losses on that particular
    machine generation.

    Scott: the Burroughs Medium Systems with BCD everything is wild.
    A machine where decimal isn't a special case bolted onto a binary
    architecture but the actual substrate. Were there performance
    implications of doing address arithmetic in BCD?

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lynn Wheeler@3:633/10 to All on Mon Jun 1 05:08:06 2026

    25oct2006 comp.arch/a.f.c post with archived 08aug81 email pascal
    "benchmark" including pascal w/pl.8 backend

    6m 30 secs PERQ (with PERQ's Pascal compiler, of course)
    4m 55 secs 68000 with PASCAL/PL.8 compiler at OPT 2
    0m 21.5 secs 3033 PASCAL/VS with Optimization
    0m 10.5 secs 3033 with PASCAL/PL.8 at OPT 0
    0m 5.9 secs 3033 with PASCAL/PL.8 at OPT 3

    --
    virtualization experience starting Jan1968, online at home since Mar1970

    --- PyGate Linux v1.5.15
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)