• How would you fix this alignment bug?

    From James Dow Allen@3:633/10 to All on Sat Oct 11 17:47:07 2025

    As background: When a BSD or SunOS 3.0 application reads a regular
    file, the read goes into kernel memory and the data is then copied.
    With SunOS 4.0, memory-mapping is used and is done carefully.
    *But none of this applies when reading a RAW device.*

    This is a TRUE story from about 36 years ago. I was working with SunOS 4.0 before it had been released to any customers. I wanted to inspect a disk
    label and typed
    cp /dev/rsd01 Label
    ... planning to hit ctrl-C a second or two later. Never mind how stupid
    you imagine this to be. It was actually part of my consulting job to
    do strange things to see if the software would make strange mistakes.

    And such a mistake did indeed occur. Immediately on hitting Return
    I saw "Kernel panic ... Rebooting" or whatever the message was.

    There were FOUR (4) things conspiring together to panic the Kernel.

    (1) The VME Bus was being told to do 4-byte transfers to an address
    which was not a multiple of 4. This caused the Panic. For obvious
    reasons, nobody wanted to try to fix this hardware limitation.

    (2) The SCSI disk driver was not checking for this misalignment.
    It should have either (a) transferred from disk to its own correctly
    aligned buffer and then copied; or at a minimum (b) detected the
    misalignment and present EIEIO. Better that 'cp' print 'I/O Error'
    than that the machine reboot!

    (3) The cp.c source code, noticing that it was reading from a raw device,
    went to a simple piece of code that did something like:
    char buff[BUFSZ];
    read(fdin, buff, cnt > BUFSIZ ? BUFSIZ : cnt);
    There was something like a 50:50 chance whether buff's address would be congruent to 0 or congruent to 2 modulo 4. Make major changes to the
    cp.c source and expect a 50% chance of "fixing" the problem randomly!
    Obviously there are several ways for cp.c to avoid the problem, e.g.
    long buff[BUFSZ/4]; // but there are cleaner-looking ways.

    Of course the problem was more generic than cp.c. (I don't recall
    whether cat.c had the same problem or not.)

    While agreeing that the fixes in (2) and (3) should both be implemented,
    I proposed a very generic improvement to the compiler, which would automatically fix this and similar problems.

    (4) Whenever the compiler encounters a declaration like
    char whatever[998];
    OUTSIDE the scope of a struct, the compiler should check whether certain conditions are present:
    * the size of the array equals or exceeds some threshold.
    (128 might be a good compromise)
    * the size of the array is an even number.
    WHEN those conditions are present then the compiler forces that array
    to be 4-byte aligned. (Or make it 8-byte aligned, whatever.)

    Note that this fix would cost VERY little: Wasting two bytes half the time
    on arrays of size 128+ is tiny AND only a little more effort could
    reduce the waste much more. The fix would not only avoid the needs for
    (2)* and (3) just mentioned, but would enhance performance in many
    unrelated situations.

    I was consulting for a hardware team, and had no contact with the
    compiler team, etc. As far as I know, NONE of the fixes were
    implemented! (Few users with superuser status type "cp /dev/rsd01 Label" !) There was a young man whom we enlisted as Ambassador to The Software Gurus
    and his response, explaining that the compiler would NOT be modified,
    struck me as ... amusing! Paraphrasing:
    " I have gone to the mountain-top and learned that the following
    is Written in Stone. The mandatory alignment of a character
    array is Two; and Two is the mandatory alignment of a
    character array. "

    Comments?

    Cheers,
    James

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Charlie Gibbs@3:633/10 to All on Sat Oct 11 17:54:25 2025
    On 2025-10-11, James Dow Allen <user4353@newsgrouper.org.invalid> wrote:

    " I have gone to the mountain-top and learned that the following
    is Written in Stone. The mandatory alignment of a character
    array is Two; and Two is the mandatory alignment of a
    character array. "

    "Two is the number thou shalt count, and the number
    of the counting shall be two. Four is right out."

    (with thanks to Monty Python)

    --
    /~\ Charlie Gibbs | Growth for the sake of
    \ / <cgibbs@kltpzyxm.invalid> | growth is the ideology
    X I'm really at ac.dekanfrus | of the cancer cell.
    / \ if you read it the right way. | -- Edward Abbey

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Sat Oct 11 19:26:11 2025
    On Sat, 11 Oct 2025 17:47:07 GMT, James Dow Allen wrote:

    I proposed a very generic improvement to the compiler, which would automatically fix this and similar problems.

    This is why GCC includes attributes to let you control allocation
    alignment and other such things. Because default behaviour is not
    appropriate in all situations.

    The obvious place to apply the fix is in the SCSI driver.

    There was a young man whom we enlisted as Ambassador to The Software
    Gurus and his response, explaining that the compiler would NOT be
    modified, struck me as ... amusing!

    Conway?s Law: ?Any piece of software reflects the organizational structure that produced it?. A lack of communication between the developers and the users of the compiler results in a similar disconnect between its design
    and the way it actually has to be used.

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Sat Oct 11 19:50:48 2025
    On Sat, 11 Oct 2025 19:36:01 -0000 (UTC), Waldek Hebisch wrote:

    Open source project would implement at least error checking in (2) and probably (3).

    Userland code, even privileged userland code, being able to crash the
    kernel would certainly be an unacceptable situation.

    I did this on one occasion, to a DEC Alpha. I was trying to call a low-
    level ioctl, I think it was, from Perl. The first crash happened while I
    was doing it as root, so I tried again as a non-privileged user ... and crashed the machine again.

    --- PyGate Linux v1.0
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)