• Re: I think this could be an interesting challenge!

    From Tristan Wibberley@3:633/10 to All on Sun Mar 22 23:02:04 2026
    On 22/03/2026 14:38, DFS wrote:
    ---------------------
    Objective
    ---------------------
    deliver a C (and optional 2nd language) program that - from a large list
    of unsorted words possibly containing duplicates - extracts 26 sets of
    100 random and unique words that each begin with a letter of the English alphabet.

    What random distribution, uniform?
    Said distribution over the unique words or said distribution over the
    original list?

    pseudorandom?


    --
    Tristan Wibberley

    The message body is Copyright (C) 2026 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except that you may,
    of course, cite it academically giving credit to me, distribute it
    verbatim as part of a usenet system or its archives, and use it to
    promote my greatness and general superiority without misrepresentation
    of my opinions other than my opinion of my greatness and general
    superiority which you _may_ misrepresent. You definitely MAY NOT train
    any production AI system with it but you may train experimental AI that
    will only be used for evaluation of the AI methods it implements.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Tristan Wibberley@3:633/10 to All on Sun Mar 22 23:11:07 2026
    On 22/03/2026 14:38, DFS wrote:
    ---------------------
    Objective
    ---------------------
    deliver a C (and optional 2nd language) program that - from a large list
    of unsorted words possibly containing duplicates - extracts 26 sets of
    100 random and unique words that each begin with a letter of the English alphabet.

    By "extracts" do you mean to imply that instances of words are selected
    with removal from the population rather than being returned for the
    following selection events?

    --
    Tristan Wibberley

    The message body is Copyright (C) 2026 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except that you may,
    of course, cite it academically giving credit to me, distribute it
    verbatim as part of a usenet system or its archives, and use it to
    promote my greatness and general superiority without misrepresentation
    of my opinions other than my opinion of my greatness and general
    superiority which you _may_ misrepresent. You definitely MAY NOT train
    any production AI system with it but you may train experimental AI that
    will only be used for evaluation of the AI methods it implements.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Sun Mar 22 19:14:38 2026
    On 3/22/2026 7:02 PM, Tristan Wibberley wrote:
    On 22/03/2026 14:38, DFS wrote:
    ---------------------
    Objective
    ---------------------
    deliver a C (and optional 2nd language) program that - from a large list
    of unsorted words possibly containing duplicates - extracts 26 sets of
    100 random and unique words that each begin with a letter of the English
    alphabet.

    What random distribution, uniform?
    Said distribution over the unique words or said distribution over the original list?

    pseudorandom?


    I don't care about the uniformity of the distribution, as long as the
    output is unique words, and you generate and use 2600+ random values
    from a RNG.






    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Sun Mar 22 19:16:27 2026
    On 3/22/2026 7:11 PM, Tristan Wibberley wrote:
    On 22/03/2026 14:38, DFS wrote:
    ---------------------
    Objective
    ---------------------
    deliver a C (and optional 2nd language) program that - from a large list
    of unsorted words possibly containing duplicates - extracts 26 sets of
    100 random and unique words that each begin with a letter of the English
    alphabet.

    By "extracts" do you mean to imply that instances of words are selected
    with removal from the population rather than being returned for the
    following selection events?


    Nothing is removed.

    'Identify' is probably a better word than 'extract'.



    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Tristan Wibberley@3:633/10 to All on Sun Mar 22 23:19:05 2026
    On 22/03/2026 14:38, DFS wrote:
    3) print the 2600 words you identify in column x row order in a grid of

    ITYM "choose" rather than "identify".

    "identify" means to judge something as being equivalent (for some
    preferred equivalence relation) to a special reference or to elements of
    a special set of references.

    --
    Tristan Wibberley

    The message body is Copyright (C) 2026 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except that you may,
    of course, cite it academically giving credit to me, distribute it
    verbatim as part of a usenet system or its archives, and use it to
    promote my greatness and general superiority without misrepresentation
    of my opinions other than my opinion of my greatness and general
    superiority which you _may_ misrepresent. You definitely MAY NOT train
    any production AI system with it but you may train experimental AI that
    will only be used for evaluation of the AI methods it implements.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Tristan Wibberley@3:633/10 to All on Sun Mar 22 23:21:43 2026
    On 22/03/2026 14:38, DFS wrote:
    You must call a RNG 2600+ times to build the list

    ie you can't use the
    random ordering of the input file to your advantage).˙


    The two are not the same, that is, the use of "ie" is wrong.

    Which do you really require, or do you really require I satisfy the
    conjunction of the two?

    --
    Tristan Wibberley

    The message body is Copyright (C) 2026 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except that you may,
    of course, cite it academically giving credit to me, distribute it
    verbatim as part of a usenet system or its archives, and use it to
    promote my greatness and general superiority without misrepresentation
    of my opinions other than my opinion of my greatness and general
    superiority which you _may_ misrepresent. You definitely MAY NOT train
    any production AI system with it but you may train experimental AI that
    will only be used for evaluation of the AI methods it implements.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Sun Mar 22 19:41:54 2026
    On 3/22/2026 7:21 PM, Tristan Wibberley wrote:
    On 22/03/2026 14:38, DFS wrote:
    You must call a RNG 2600+ times to build the list

    ie you can't use the
    random ordering of the input file to your advantage).


    The two are not the same, that is, the use of "ie" is wrong.


    I never said they were the same.

    If you don't use a random number generator, you can just read in the
    randomly sorted file and count words until you have your sets. That's
    no fun, effort, or reward.

    I did that during development, and it was super easy (but you still have
    to deal with duplicates).

    I made this one a little more difficult by requiring the usage of a RNG.



    Which do you really require, or do you really require I satisfy the conjunction of the two?

    I stated the requirement.






    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bart@3:633/10 to All on Sun Mar 22 23:53:36 2026
    On 22/03/2026 14:38, DFS wrote:
    ---------------------
    Objective
    ---------------------
    deliver a C (and optional 2nd language) program that - from a large list
    of unsorted words possibly containing duplicates - extracts 26 sets of
    100 random and unique words that each begin with a letter of the English alphabet.

    I've had a first go at this challenge, using a scripting language to get something working as a reference.

    Not C, so that code is here:
    https://github.com/sal55/langs/blob/master/dfs.q

    The output it produced is this: https://github.com/sal55/langs/blob/master/output. (I think it is
    missing the heading for challenge 3.) It took 0.35 seconds to write that
    file.

    I haven't looked at your version in detail but did notice the
    line-counts (as I had to delete those lines for a previous reply).

    Any solution I come up with in C (which may take a while!) will have to
    use entirely different methods. I'm not interested in writing
    hash-tables etc in C, I'm far too lazy. Probably it will be much longer
    than yours.

    One thing which is still not clear is how to choose the layout of the
    final challenge. I assume the number of rows has to be a multiple of
    100, but how to decide the columns? I went with 3 columns max as the
    most practical.

    (Probably my version will go wrong if there aren't at least 100 words
    per letter in the input.)

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Tristan Wibberley@3:633/10 to All on Mon Mar 23 00:05:11 2026
    On 22/03/2026 23:14, DFS wrote:
    On 3/22/2026 7:02 PM, Tristan Wibberley wrote:
    On 22/03/2026 14:38, DFS wrote:
    ---------------------
    Objective
    ---------------------
    deliver a C (and optional 2nd language) program that - from a large list >>> of unsorted words possibly containing duplicates - extracts 26 sets of
    100 random and unique words that each begin with a letter of the English >>> alphabet.

    What random distribution, uniform?
    Said distribution over the unique words or said distribution over the
    original list?

    pseudorandom?


    I don't care about the uniformity of the distribution, as long as the
    output is unique words, and you generate and use 2600+ random values
    from a RNG.


    I think you're unaware that I can predictably generate a sequence of
    identical values when the distribution is free and your specification is satisfied by selecting with a distribution that prefers just one
    indicatory value for a choice of word to the exclusion of all others.

    You mention an RNG, I suppose then that you exclude pseudo-random
    numbers because those are normally referred to as PRNGs and I understand
    that RNG excludes them.

    --
    Tristan Wibberley

    The message body is Copyright (C) 2026 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except that you may,
    of course, cite it academically giving credit to me, distribute it
    verbatim as part of a usenet system or its archives, and use it to
    promote my greatness and general superiority without misrepresentation
    of my opinions other than my opinion of my greatness and general
    superiority which you _may_ misrepresent. You definitely MAY NOT train
    any production AI system with it but you may train experimental AI that
    will only be used for evaluation of the AI methods it implements.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Tristan Wibberley@3:633/10 to All on Mon Mar 23 00:09:37 2026
    On 22/03/2026 23:41, DFS wrote:
    On 3/22/2026 7:21 PM, Tristan Wibberley wrote:
    On 22/03/2026 14:38, DFS wrote:
    You must call a RNG 2600+ times to build the list

    ie you can't use the
    random ordering of the input file to your advantage).


    The two are not the same, that is, the use of "ie" is wrong.


    I never said they were the same.

    I can do it with fewer than 2600 calls for some input files, I could do
    it with 100 sometimes. "random" doesn't mean that you don't use the same sequence of just 100 determiners 26 times.

    Since the

    --
    Tristan Wibberley

    The message body is Copyright (C) 2026 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except that you may,
    of course, cite it academically giving credit to me, distribute it
    verbatim as part of a usenet system or its archives, and use it to
    promote my greatness and general superiority without misrepresentation
    of my opinions other than my opinion of my greatness and general
    superiority which you _may_ misrepresent. You definitely MAY NOT train
    any production AI system with it but you may train experimental AI that
    will only be used for evaluation of the AI methods it implements.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Tristan Wibberley@3:633/10 to All on Mon Mar 23 00:10:01 2026
    On 22/03/2026 23:41, DFS wrote:
    On 3/22/2026 7:21 PM, Tristan Wibberley wrote:
    On 22/03/2026 14:38, DFS wrote:
    You must call a RNG 2600+ times to build the list

    ie you can't use the
    random ordering of the input file to your advantage).


    The two are not the same, that is, the use of "ie" is wrong.


    I never said they were the same.

    I can do it with fewer than 2600 calls for some input files, I could do
    it with 100 sometimes. "random" doesn't mean that you don't use the same sequence of just 100 determiners 26 times.

    --
    Tristan Wibberley

    The message body is Copyright (C) 2026 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except that you may,
    of course, cite it academically giving credit to me, distribute it
    verbatim as part of a usenet system or its archives, and use it to
    promote my greatness and general superiority without misrepresentation
    of my opinions other than my opinion of my greatness and general
    superiority which you _may_ misrepresent. You definitely MAY NOT train
    any production AI system with it but you may train experimental AI that
    will only be used for evaluation of the AI methods it implements.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From John McCue@3:633/10 to All on Mon Mar 23 01:23:40 2026
    DFS <nospam@dfs.com> wrote:
    On 3/22/2026 1:29 PM, John McCue wrote:
    DFS <nospam@dfs.com> wrote:
    <snip>
    ---------------------
    Word Source
    ---------------------
    There's a huge unsorted word list here:

    https://limewire.com/?referrer=pq7i8xx7p2

    ...which you can develop against.

    Do I need to create an ID to get the list ?

    I don't think so.

    It didn't give me an ID or login when I uploaded them.


    I just now uploaded it here: https://filebin.net/kkkyqw1ritefnw0f


    Thanks, I was able to download it.

    --
    [t]csh(1) - "An elegant shell, for a more... civilized age."
    - Paraphrasing Star Wars

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Sun Mar 22 23:40:36 2026
    On 3/22/2026 7:53 PM, Bart wrote:
    On 22/03/2026 14:38, DFS wrote:
    ---------------------
    Objective
    ---------------------
    deliver a C (and optional 2nd language) program that - from a large
    list of unsorted words possibly containing duplicates - extracts 26
    sets of 100 random and unique words that each begin with a letter of
    the English alphabet.

    I've had a first go at this challenge, using a scripting language to get something working as a reference.

    Thanks for doing it.


    Not C, so that code is here: https://github.com/sal55/langs/blob/master/dfs.q

    slick. It's a powerful scripting language. Reading a text file in with
    one line is nice. It's about 10 lines of C.

    Did you look to python for inspiration when creating it?


    Looks like line 16 is where you call a randomizer. If you put a counter
    at line 17 what does it say after the program is run?

    Is bounds a property of your list objects?

    Is bounds a pair of numbers 0..length of list-1?

    What generates your random values?


    The output it produced is this: https://github.com/sal55/langs/blob/master/output. (I think it is
    missing the heading for challenge 3.) It took 0.35 seconds to write that file.

    Perfect!

    You did it quick, too.


    I haven't looked at your version in detail but did notice the
    line-counts (as I had to delete those lines for a previous reply).
    Any solution I come up with in C (which may take a while!) will
    have to
    use entirely different methods. I'm not interested in writing
    hash-tables etc in C, I'm far too lazy. Probably it will be much longer
    than yours.

    You have to deliver C to get a chance at the prize.

    And I like to see different approaches. The way I did it in C and
    Python is similar, but Python makes it SO easy (one-line) to segregate
    words by letter that I took the easy way out there.


    One thing which is still not clear is how to choose the layout of the
    final challenge. I assume the number of rows has to be a multiple of
    100, but how to decide the columns?

    Whatever rows * columns >= 2600 will work.

    A row count divisible by 100 is suggested, but I just tried 'off
    numbers' such as 262r x 10c with both my C and Python, and they work fine.

    My solutions have rows and columns as command line arguments.
    $ ./program words.txt 1000 3


    I went with 3 columns max as the most practical.

    That's fine. I have one of those ultra-wide-screen monitors, so I can
    see up to 13 columns.


    (Probably my version will go wrong if there aren't at least 100 words
    per letter in the input.)

    Same here, but I didn't test that.

    The lowest number of words per letter in that input file is 446 (for
    letter x).


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Mon Mar 23 00:06:21 2026
    On 3/22/2026 8:05 PM, Tristan Wibberley wrote:
    On 22/03/2026 23:14, DFS wrote:
    On 3/22/2026 7:02 PM, Tristan Wibberley wrote:
    On 22/03/2026 14:38, DFS wrote:
    ---------------------
    Objective
    ---------------------
    deliver a C (and optional 2nd language) program that - from a large list >>>> of unsorted words possibly containing duplicates - extracts 26 sets of >>>> 100 random and unique words that each begin with a letter of the English >>>> alphabet.

    What random distribution, uniform?
    Said distribution over the unique words or said distribution over the
    original list?

    pseudorandom?


    I don't care about the uniformity of the distribution, as long as the
    output is unique words, and you generate and use 2600+ random values
    from a RNG.


    I think you're unaware that I can predictably generate a sequence of identical values when the distribution is free and your specification is satisfied by selecting with a distribution that prefers just one
    indicatory value for a choice of word to the exclusion of all others.


    Yeah, I don't really know what any of that means. But it sounds like
    your 3rd attempt to sidestep the generation and use of 2600+ random values.

    I think you could show some interesting techniques, but you have to
    adhere to the requirements of the challenge.



    You mention an RNG, I suppose then that you exclude pseudo-random
    numbers because those are normally referred to as PRNGs and I understand
    that RNG excludes them.

    I always understood PRNGs and CSPRNGs to be subsets of RNGs. But use
    what you like.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Mon Mar 23 10:53:10 2026
    On Sun, 22 Mar 2026 23:21:43 +0000
    Tristan Wibberley <tristan.wibberley+netnews2@alumni.manchester.ac.uk>
    wrote:

    On 22/03/2026 14:38, DFS wrote:
    You must call a RNG 2600+ times to build the list

    ie you can't use the
    random ordering of the input file to your advantage).?


    The two are not the same, that is, the use of "ie" is wrong.

    Which do you really require, or do you really require I satisfy the conjunction of the two?


    Do you try to hint that challenges with seemingly arbitrary rules and
    seemingly arbitrary purposes are not very worthy?
    If yes, then you could as well say it directly.



    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Mon Mar 23 10:56:34 2026
    On 3/23/2026 4:53 AM, Michael S wrote:


    Do you try to hint that challenges with seemingly arbitrary rules and seemingly arbitrary purposes are not very worthy?


    Arbitrary and worth are in the eyes of the beholder.

    So keep your arbitrary, worthless opinions to yourself.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bart@3:633/10 to All on Mon Mar 23 16:03:26 2026
    On 23/03/2026 03:40, DFS wrote:
    On 3/22/2026 7:53 PM, Bart wrote:

    Not C, so that code is here: https://github.com/sal55/langs/blob/
    master/dfs.q

    slick.˙ It's a powerful scripting language.˙ Reading a text file in with
    one line is nice.˙ It's about 10 lines of C.

    Well, it can be one line in C too, once you create a function for it!


    Did you look to python for inspiration when creating it?

    No. I glanced at it but all I remember is that it was 58 lines.


    Looks like line 16 is where you call a randomizer.˙ If you put a counter
    at line 17 what does it say after the program is run?

    It's called 2631 times. With a different seed, it will vary.

    Is bounds a property of your list objects?

    Is bounds a pair of numbers 0..length of list-1?

    Yes, but the bounds usually start from 1. And here, the 'long' and
    short' lists have bounds of 'a' to 'z' (97 to 122).

    What generates your random values?

    I use the PRNG shown below (not C code, and not mine).

    There are a couple of levels of functions on top. The range-based
    'random()' in the scripting language probably gives slightly biased
    results, but none of my stuff including this is critical.


    Any solution I come up with in C (which may take a while!) will
    have to
    use entirely different methods. I'm not interested in writing hash-
    tables etc in C, I'm far too lazy. Probably it will be much longer
    than yours.

    You have to deliver C to get a chance at the prize.

    I decided to do it in my 'M' language first as there are fewer i's and
    t's to dot and cross when developing an algorithm.

    That part's been done, now all that remains is manual porting to C. I
    will do that later. (Auto-transpiling to C works, but I guess that's not
    the kind of C you want.)

    (If interested, my version is here; it's about 160 lines: https://github.com/sal55/langs/blob/master/dfs.m. I had planned to use
    C's qsort(), but that didn't seem to work, so it includes a sort routine.)

    This version produces the output in 0.30 seconds.

    BTW the challenge has proved useful as it showed up bugs in both my
    scripting language and the compiled one. The first has been fixed, the
    second will be; I used the previous compiler version to test the code.

    ---------------------
    [2]int seed = (0x2989'8811'1111'1272',0x1673'2673'7335'8264)

    export func mrandom:u64 =
    int x, y
    x := seed[1]
    y := seed[2]
    seed[1] := y
    x ixor:= x<<23
    seed[2] := x ixor y ixor x>>17 ixor y>>26
    return seed[2] + y
    end


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Mon Mar 23 12:26:44 2026
    On 3/23/2026 4:53 AM, Michael S wrote:

    arbitrary purposes > not very worthy


    The purposes are to extensively challenge your educational background
    and your data processing skills with C.

    Can you and your code:

    * use the alphabet?

    * count to 100?

    * count words by first letter?

    * find duplicate words and proper case them?

    * handle duplicate random numbers?

    * generate a unique set of words?

    * print sorted output by column then row?


    Well can you?

    Are you worthy?





    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bart@3:633/10 to All on Mon Mar 23 22:26:17 2026
    On 23/03/2026 03:40, DFS wrote:
    On 3/22/2026 7:53 PM, Bart wrote:

    I haven't looked at your version in detail but did notice the
    line-counts (as I had to delete those lines for a previous reply).
    Any solution I come up with in C (which may take a while!) will
    have to
    use entirely different methods. I'm not interested in writing hash-
    tables etc in C, I'm far too lazy. Probably it will be much longer
    than yours.

    You have to deliver C to get a chance at the prize.

    And I like to see different approaches.˙ The way I did it in C and
    Python is similar, but Python makes it SO easy (one-line) to segregate
    words by letter that I took the easy way out there.


    I now have a C version, a bit long to post so is at this link:

    https://github.com/sal55/langs/blob/master/dfs.c

    It looks very clunky but seems to do the job, and not too slowly either
    (see below).

    I then tried yours, which is somewhat shorter (160 lines vs my 205
    lines, which includes blanks etc).

    However, that doesn't seem to do part (2) of the challenge. While that
    doesn't explicity say the unsorted duplicates must be shown, that's what
    the example does:

    found: eventually dupes you get
    output: Dupes Eventually Get You

    Your C program (I see the Python does it too) shows the equivalent of this:

    Duplicate words in proper case
    Dupes Eventually Get You

    Now, I noticed that my original M version displayed that first 'found'
    like, but the words were sorted, not unsorted! Displaying the original
    order involved quite a bit of extra work, and an extra copy of the
    word-list. The method is also inefficient.

    So, is that necessary, or not? If not then I can simplify my versions.

    Anyway, my C version does absolutely nothing clever. Everything is a
    linear search. The only hi-tech bit is the quicksort routine.

    Timing, all run under Windows:

    My C: 0.30 seconds
    Your C: 0.25 seconds

    My Q: 0.34 seconds
    Your Python: 1.77 seconds (CPython)
    0.88 seconds (PyPy)

    The C timings are unoptimised; optimising might knock off 0.01 or 0.02 seconds.

    I don't know why the Python timing is slow, especially given that its
    sort() routine will be internal native code function, and mine runs as bytecode.

    My interpreters generally are faster than CPython at executing bytecode,
    but with tasks like this, most time is usually spent within internal
    native code libraries.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Tim Rentsch@3:633/10 to All on Tue Mar 24 00:43:34 2026
    DFS <nospam@dfs.com> writes:

    On 3/22/2026 1:29 PM, John McCue wrote:

    DFS <nospam@dfs.com> wrote:
    <snip>

    ---------------------
    Word Source
    ---------------------
    There's a huge unsorted word list here:

    https://limewire.com/?referrer=pq7i8xx7p2

    ...which you can develop against.

    Do I need to create an ID to get the list ?

    I don't think so.

    It didn't give me an ID or login when I uploaded them.


    I just now uploaded it here: https://filebin.net/kkkyqw1ritefnw0f

    A fucking web page. How about a link to a plain text file
    that has just the words?

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Tue Mar 24 04:38:11 2026
    On 3/24/2026 3:43 AM, Tim Rentsch wrote:
    DFS <nospam@dfs.com> writes:

    On 3/22/2026 1:29 PM, John McCue wrote:

    DFS <nospam@dfs.com> wrote:
    <snip>

    ---------------------
    Word Source
    ---------------------
    There's a huge unsorted word list here:

    https://limewire.com/?referrer=pq7i8xx7p2

    ...which you can develop against.

    Do I need to create an ID to get the list ?

    I don't think so.

    It didn't give me an ID or login when I uploaded them.


    I just now uploaded it here: https://filebin.net/kkkyqw1ritefnw0f

    A fucking web page. How about a link to a plain text file
    that has just the words?


    Just fucking click on the fucking file name.

    ;)


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Tristan Wibberley@3:633/10 to All on Tue Mar 24 11:03:12 2026
    On 23/03/2026 04:06, DFS wrote:
    On 3/22/2026 8:05 PM, Tristan Wibberley wrote:
    On 22/03/2026 23:14, DFS wrote:
    On 3/22/2026 7:02 PM, Tristan Wibberley wrote:
    On 22/03/2026 14:38, DFS wrote:
    ---------------------
    Objective
    ---------------------
    deliver a C (and optional 2nd language) program that - from a large
    list
    of unsorted words possibly containing duplicates - extracts 26 sets of >>>>> 100 random and unique words that each begin with a letter of the
    English
    alphabet.

    What random distribution, uniform?
    Said distribution over the unique words or said distribution over the
    original list?

    pseudorandom?


    I don't care about the uniformity of the distribution, as long as the
    output is unique words, and you generate and use 2600+ random values
    from a RNG.


    I think you're unaware that I can predictably generate a sequence of
    identical values when the distribution is free and your specification is
    satisfied by selecting with a distribution that prefers just one
    indicatory value for a choice of word to the exclusion of all others.


    Yeah, I don't really know what any of that means.˙ But it sounds like
    your 3rd attempt to sidestep the generation and use of 2600+ random values.

    I think you could show some interesting techniques, but you have to
    adhere to the requirements of the challenge.

    It's because of my deeper understanding of the meaning (or barely meaningfulness) of the word "random" and my awareness of how critical it
    is to many applications of randomness.

    That is:

    - If it's a game I could go ahead with a PRNG and satisfy you easily -
    but it's not interesting to me these days, at this point I think it's a
    game,

    - If it's a secure application of choice that leaks /no/ information
    about the input list beyond the fact of the achievement of the
    lower-bound, respectively, on the number of words having each initial, I
    can tighten your specification in some of the ways I've queried,

    - If it's a secure application of choice that may leak some information
    about the input list but may not leak any of the information of the
    original ordering of the words (which was implied to be the reason for
    the minimum number of queries for random numbers) then I can use fewer
    bits of entropy, saving runtime costs. This 3rd possible endeavour is
    less relevant now that you allow a PRNG because I don't have to care
    about the cost of bits of entropy or their turnaround time. It's still
    an interesting one when considering the nature of the task of
    requirements engineering and agreeing requirements. Programmes can fail
    due to uncompetitiveness induced by individual member projects with
    unnecessary or insufficient requirements.

    More than that, though, queries for random numbers may come in
    individual bits and an implementation might query 16 numbers, for
    example, for each word choice, rather than one. And it goes deeper than
    that. That means the request to query for 2600 numbers is sort of
    meaningless and can lead to programme failure by being in the class of unnecessary itself and its presence leading to the other requirements
    being insufficient.

    More still, people get gambling games wrong and go to jail because they
    learn and practice C programming without any awareness of the difficulty
    of "random" and they might read this newsgroup to shape their skills.

    So you see, each of my questions were properly important for many reasons.

    I really did think about it carefully.

    --
    Tristan Wibberley

    The message body is Copyright (C) 2026 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except that you may,
    of course, cite it academically giving credit to me, distribute it
    verbatim as part of a usenet system or its archives, and use it to
    promote my greatness and general superiority without misrepresentation
    of my opinions other than my opinion of my greatness and general
    superiority which you _may_ misrepresent. You definitely MAY NOT train
    any production AI system with it but you may train experimental AI that
    will only be used for evaluation of the AI methods it implements.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Tristan Wibberley@3:633/10 to All on Tue Mar 24 11:04:43 2026
    On 23/03/2026 14:56, DFS wrote:
    On 3/23/2026 4:53 AM, Michael S wrote:


    Do you try to hint that challenges with seemingly arbitrary rules and
    seemingly arbitrary purposes are not very worthy?


    Arbitrary and worth are in the eyes of the beholder.

    So keep your arbitrary, worthless opinions to yourself.


    Indeed it's a very important and /extremely/ interesting challenge, for
    reasons I state near the end of another post I recently made.


    --
    Tristan Wibberley

    The message body is Copyright (C) 2026 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except that you may,
    of course, cite it academically giving credit to me, distribute it
    verbatim as part of a usenet system or its archives, and use it to
    promote my greatness and general superiority without misrepresentation
    of my opinions other than my opinion of my greatness and general
    superiority which you _may_ misrepresent. You definitely MAY NOT train
    any production AI system with it but you may train experimental AI that
    will only be used for evaluation of the AI methods it implements.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bart@3:633/10 to All on Tue Mar 24 12:58:02 2026
    On 24/03/2026 11:03, Tristan Wibberley wrote:
    On 23/03/2026 04:06, DFS wrote:
    On 3/22/2026 8:05 PM, Tristan Wibberley wrote:
    On 22/03/2026 23:14, DFS wrote:
    On 3/22/2026 7:02 PM, Tristan Wibberley wrote:
    On 22/03/2026 14:38, DFS wrote:
    ---------------------
    Objective
    ---------------------
    deliver a C (and optional 2nd language) program that - from a large >>>>>> list
    of unsorted words possibly containing duplicates - extracts 26 sets of >>>>>> 100 random and unique words that each begin with a letter of the
    English
    alphabet.

    What random distribution, uniform?
    Said distribution over the unique words or said distribution over the >>>>> original list?

    pseudorandom?


    I don't care about the uniformity of the distribution, as long as the
    output is unique words, and you generate and use 2600+ random values
    from a RNG.


    I think you're unaware that I can predictably generate a sequence of
    identical values when the distribution is free and your specification is >>> satisfied by selecting with a distribution that prefers just one
    indicatory value for a choice of word to the exclusion of all others.


    Yeah, I don't really know what any of that means.˙ But it sounds like
    your 3rd attempt to sidestep the generation and use of 2600+ random values. >>
    I think you could show some interesting techniques, but you have to
    adhere to the requirements of the challenge.

    It's because of my deeper understanding of the meaning (or barely meaningfulness) of the word "random" and my awareness of how critical it
    is to many applications of randomness.

    That is:

    - If it's a game I could go ahead with a PRNG and satisfy you easily -
    but it's not interesting to me these days, at this point I think it's a
    game,

    - If it's a secure application of choice that leaks /no/ information
    about the input list beyond the fact of the achievement of the
    lower-bound, respectively, on the number of words having each initial, I
    can tighten your specification in some of the ways I've queried,

    - If it's a secure application of choice that may leak some information about the input list but may not leak any of the information of the
    original ordering of the words (which was implied to be the reason for
    the minimum number of queries for random numbers) then I can use fewer
    bits of entropy, saving runtime costs. This 3rd possible endeavour is
    less relevant now that you allow a PRNG because I don't have to care
    about the cost of bits of entropy or their turnaround time. It's still
    an interesting one when considering the nature of the task of
    requirements engineering and agreeing requirements. Programmes can fail
    due to uncompetitiveness induced by individual member projects with unnecessary or insufficient requirements.

    More than that, though, queries for random numbers may come in
    individual bits and an implementation might query 16 numbers, for
    example, for each word choice, rather than one. And it goes deeper than
    that. That means the request to query for 2600 numbers is sort of
    meaningless and can lead to programme failure by being in the class of unnecessary itself and its presence leading to the other requirements
    being insufficient.

    More still, people get gambling games wrong and go to jail because they
    learn and practice C programming without any awareness of the difficulty
    of "random" and they might read this newsgroup to shape their skills.

    So you see, each of my questions were properly important for many reasons.

    I really did think about it carefully.


    Every time random numbers come up in this group then suddenly everybody
    is an expert and no PRNG except the most perfect and secure will do,
    what matter what the application. Ideally a true RNG.

    This is a just a fun programming exercise. You can assume some function 'rand()' that returns good-enough values. So long as it doesn't simply
    return the same value each time, or 1,2,3,...; the bar needn't be high!

    Actually, I have just tried a PRNG that returns 1,2,3,.. on my attempt,
    and I couldn't really tell from the output that there was anything
    wrong. Thanks to the input being unsorted anyway.



    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Scott Lurndal@3:633/10 to All on Tue Mar 24 14:02:49 2026
    DFS <nospam@dfs.com> writes:
    On 3/24/2026 3:43 AM, Tim Rentsch wrote:
    DFS <nospam@dfs.com> writes:

    On 3/22/2026 1:29 PM, John McCue wrote:

    DFS <nospam@dfs.com> wrote:
    <snip>

    ---------------------
    Word Source
    ---------------------
    There's a huge unsorted word list here:

    https://limewire.com/?referrer=pq7i8xx7p2

    ...which you can develop against.

    Do I need to create an ID to get the list ?

    I don't think so.

    It didn't give me an ID or login when I uploaded them.


    I just now uploaded it here: https://filebin.net/kkkyqw1ritefnw0f

    A fucking web page. How about a link to a plain text file
    that has just the words?


    Just fucking click on the fucking file name.

    Not everyone reads usenet with a browser or a news client
    that understands hypertext or the hypertext transfer protocol.

    I would generally have used 'wget' to fetch, so if you'd
    specified:

    https://filebin.net/kkkyqw1ritefnw0f/words_unsorted.txt

    That may have been slightly better, but it appears that
    filebin interposes a warning screen and forces a second
    click, so wget may also have failed.

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Tue Mar 24 12:03:52 2026
    On 3/24/2026 10:02 AM, Scott Lurndal wrote:
    DFS <nospam@dfs.com> writes:
    On 3/24/2026 3:43 AM, Tim Rentsch wrote:
    DFS <nospam@dfs.com> writes:

    On 3/22/2026 1:29 PM, John McCue wrote:

    DFS <nospam@dfs.com> wrote:
    <snip>

    ---------------------
    Word Source
    ---------------------
    There's a huge unsorted word list here:

    https://limewire.com/?referrer=pq7i8xx7p2

    ...which you can develop against.

    Do I need to create an ID to get the list ?

    I don't think so.

    It didn't give me an ID or login when I uploaded them.


    I just now uploaded it here: https://filebin.net/kkkyqw1ritefnw0f

    A fucking web page. How about a link to a plain text file
    that has just the words?


    Just fucking click on the fucking file name.

    Not everyone reads usenet with a browser or a news client
    that understands hypertext or the hypertext transfer protocol.

    Sucks for them.


    I would generally have used 'wget' to fetch, so if you'd
    specified:

    https://filebin.net/kkkyqw1ritefnw0f/words_unsorted.txt

    That may have been slightly better, but it appears that
    filebin interposes a warning screen and forces a second
    click, so wget may also have failed.


    $wget -r -np https://filebin.net/kkkyqw1ritefnw0f/words_unsorted.txt


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Tue Mar 24 13:06:53 2026
    On 3/23/2026 6:26 PM, Bart wrote:
    On 23/03/2026 03:40, DFS wrote:
    On 3/22/2026 7:53 PM, Bart wrote:

    I haven't looked at your version in detail but did notice the
    line-counts (as I had to delete those lines for a previous reply).
    Any solution I come up with in C (which may take a while!) will
    have to
    use entirely different methods. I'm not interested in writing hash-
    tables etc in C, I'm far too lazy. Probably it will be much longer
    than yours.

    You have to deliver C to get a chance at the prize.

    And I like to see different approaches.˙ The way I did it in C and
    Python is similar, but Python makes it SO easy (one-line) to segregate
    words by letter that I took the easy way out there.


    I now have a C version, a bit long to post so is at this link:

    ˙https://github.com/sal55/langs/blob/master/dfs.c

    It looks very clunky but seems to do the job, and not too slowly either
    (see below).

    Years ago I was shocked how fast C chewed thru text data (and it's even
    faster dealing with numbers).

    Actually, I'm still shocked. I wrote an anagram program in C that used
    prime factors to do searches, and it found 5 anagrams from a list of
    370K words in 0.0055s (5.5/1000ths of a second).

    And it would be even faster with the use of a hash table. Incredible.

    And that's on my low-end AMD Ryzen 5600G (16GB DDR4-3200 RAM)


    I then tried yours, which is somewhat shorter (160 lines vs my 205
    lines, which includes blanks etc).

    However, that doesn't seem to do part (2) of the challenge. While that doesn't explicity say the unsorted duplicates must be shown, that's what
    the example does:

    ˙˙ found:˙ eventually dupes you get
    ˙˙ output: Dupes Eventually Get You

    Your C program (I see the Python does it too) shows the equivalent of this:

    ˙ Duplicate words in proper case
    ˙ Dupes Eventually Get You

    Now, I noticed that my original M version displayed that first 'found'
    like, but the words were sorted, not unsorted! Displaying the original
    order involved quite a bit of extra work,

    I see I wasn't clear. The 'found' output wasn't a requirement, just an example to show what the duplicates might look like unsorted.


    and an extra copy of the > word-list. The method is also inefficient.

    So, is that necessary, or not? If not then I can simplify my versions.

    No extra copy of the list is necessary to find duplicates (but for
    one-pass efficiency, sorting the list is required).

    Look at the first letter of each duplicate.

    "congratulations on the wherewithal youngun"
    cotwy

    Sort the file and the dupes are already sorted. That was intentional.

    If that explanation lets you drop some lines, good.



    My method of finding the 26 sets was to:

    1) count words by letter as the file is read in
    lettercnt[wordsin[i][0]-'a']++;

    (I saw something similar in your scripting, but couldn't spot it in your .c)


    2) sort the data just read in
    qsort(wordsin, wordcnt, sizeof(char*), comparechar);


    3) using the lettercnt[] array from step 1, determine the start-end
    positions of each set of words beginning with a..z

    Letter Start End
    a 0 20484
    b 20485 34475
    c 34476 60069
    d 60070 75050
    e 75051 86572
    f 86573 95977
    g 95978 104985
    h 104986 116490
    i 116491 127653
    j 127654 129796
    k 129797 132749
    l 132750 140949
    m 140950 157658
    n 157659 166088
    o 166089 175859
    p 175860 205604
    q 205605 207078
    r 207079 221162
    s 221163 253678
    t 253679 269769
    u 269770 287936
    v 287937 292502
    w 292503 297884
    x 297885 298330
    y 298331 299249
    z 299250 300397


    4) generate 100+ random numbers between start and end of each letter
    int r = (rand() % (end - start + 1)) + start;

    This 'calculation of start and end' for each letter is what I thought to
    be a novel approach.

    I'm curious how others will approach it (if anyone else tries).


    Altogether my program makes:
    3 passes thru the 300398 words in:
    * 1 to count total words and words by letter
    * 1 to load the words into an array
    * 1 to find duplicates

    2 passes thru the 2600 words out:
    * 1 to verify the 100 words per letter
    * 1 to print all 2600 words

    5 total passes? Not sure that's Ivy League. But everything runs in
    1/10th of a second so I can't complain.



    Anyway, my C version does absolutely nothing clever. Everything is a
    linear search.

    s'alright.


    I rejiggered my code, so main() is like yours:

    int main(int argc, char *argv[]) {
    validateinput(argc, argv);
    loadwords(argv);
    buildwordsets();

    printcountsbyletter();
    printduplicatewords();
    print2600words(argv);

    return 0;
    }

    Somehow it's consistently 0.003s faster! winning!



    The only hi-tech bit is the quicksort routine.

    I saw that. Nice.



    Timing, all run under Windows:

    ˙ My C:˙˙˙˙˙˙˙˙ 0.30 seconds
    ˙ Your C:˙˙˙˙˙˙ 0.25 seconds

    ˙ My Q:˙˙˙˙˙˙˙˙ 0.34 seconds
    ˙ Your Python:˙ 1.77 seconds (CPython)
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ 0.88 seconds (PyPy)

    The C timings are unoptimised; optimising might knock off 0.01 or 0.02 seconds.

    I don't know why the Python timing is slow, especially given that its
    sort() routine will be internal native code function, and mine runs as bytecode.

    I know!

    * on WSL Ubuntu my Python runs 10x slower than the C
    * on WSL Ubuntu my Python runs at about the same speed as on Windows

    Both speeds are very unusual, and very slow.

    Windows: Python 3.11.0
    WSL : Python 3.10.6


    My interpreters generally are faster than CPython at executing bytecode,
    but with tasks like this, most time is usually spent within internal
    native code libraries.

    Thanks for trying the challenge. Hopefully some others will.

    If you have a short challenge of medium difficulty, post it so we can
    learn and improve skillz.



    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Tue Mar 24 13:10:33 2026
    On 3/23/2026 12:03 PM, Bart wrote:
    On 23/03/2026 03:40, DFS wrote:
    On 3/22/2026 7:53 PM, Bart wrote:

    Not C, so that code is here: https://github.com/sal55/langs/blob/
    master/dfs.q

    slick.˙ It's a powerful scripting language.˙ Reading a text file in
    with one line is nice.˙ It's about 10 lines of C.

    Well, it can be one line in C too, once you create a function for it!


    Did you look to python for inspiration when creating it?

    No. I glanced at it but all I remember is that it was 58 lines.

    I don't mean my little bit of code. I mean did you look to the python language for inspirations when you were developing your scripting language?



    Looks like line 16 is where you call a randomizer.˙ If you put a
    counter at line 17 what does it say after the program is run?

    It's called 2631 times. With a different seed, it will vary.

    That's about what I was expecting.


    Is bounds a property of your list objects?

    Is bounds a pair of numbers 0..length of list-1?

    Yes, but the bounds usually start from 1. And here, the 'long' and
    short' lists have bounds of 'a' to 'z' (97 to 122).



    What generates your random values?

    I use the PRNG shown below (not˙ C code, and not mine).

    There are a couple of levels of functions on top. The range-based
    'random()' in the scripting language probably gives slightly biased
    results, but none of my stuff including this is critical.



    Any solution I come up with in C (which may take a while!) will
    have to
    use entirely different methods. I'm not interested in writing hash-
    tables etc in C, I'm far too lazy. Probably it will be much longer
    than yours.

    You have to deliver C to get a chance at the prize.

    I decided to do it in my 'M' language first as there are fewer i's and
    t's to dot and cross when developing an algorithm.

    You have separate M and Q languages?


    That part's been done, now all that remains is manual porting to C. I
    will do that later. (Auto-transpiling to C works, but I guess that's not
    the kind of C you want.)

    Probably not.


    (If interested, my version is here; it's about 160 lines: https://github.com/sal55/langs/blob/master/dfs.m. I had planned to use
    C's qsort(), but that didn't seem to work, so it includes a sort routine.)

    This version produces the output in 0.30 seconds.


    Why wouldn't qsort() work?


    BTW the challenge has proved useful as it showed up bugs in both my scripting language and the compiled one. The first has been fixed, the second will be; I used the previous compiler version to test the code.

    Awesome.


    ---------------------
    [2]int seed = (0x2989'8811'1111'1272',0x1673'2673'7335'8264)

    export func mrandom:u64 =
    ˙˙˙ int x, y
    ˙˙˙ x := seed[1]
    ˙˙˙ y := seed[2]
    ˙˙˙ seed[1] := y
    ˙˙˙ x ixor:= x<<23
    ˙˙˙ seed[2] := x ixor y ixor x>>17 ixor y>>26
    ˙˙˙ return seed[2] + y
    end


    Do you have a C version of that?

    If so I'll run it against a RNG comparison program I wrote.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Tue Mar 24 13:17:27 2026
    On 3/24/2026 7:03 AM, Tristan Wibberley wrote:
    On 23/03/2026 04:06, DFS wrote:
    On 3/22/2026 8:05 PM, Tristan Wibberley wrote:
    On 22/03/2026 23:14, DFS wrote:
    On 3/22/2026 7:02 PM, Tristan Wibberley wrote:
    On 22/03/2026 14:38, DFS wrote:
    ---------------------
    Objective
    ---------------------
    deliver a C (and optional 2nd language) program that - from a large >>>>>> list
    of unsorted words possibly containing duplicates - extracts 26 sets of >>>>>> 100 random and unique words that each begin with a letter of the
    English
    alphabet.

    What random distribution, uniform?
    Said distribution over the unique words or said distribution over the >>>>> original list?

    pseudorandom?


    I don't care about the uniformity of the distribution, as long as the
    output is unique words, and you generate and use 2600+ random values
    from a RNG.


    I think you're unaware that I can predictably generate a sequence of
    identical values when the distribution is free and your specification is >>> satisfied by selecting with a distribution that prefers just one
    indicatory value for a choice of word to the exclusion of all others.


    Yeah, I don't really know what any of that means.˙ But it sounds like
    your 3rd attempt to sidestep the generation and use of 2600+ random values. >>
    I think you could show some interesting techniques, but you have to
    adhere to the requirements of the challenge.

    It's because of my deeper understanding of the meaning (or barely meaningfulness) of the word "random" and my awareness of how critical it
    is to many applications of randomness.

    That is:

    - If it's a game I could go ahead with a PRNG and satisfy you easily -
    but it's not interesting to me these days, at this point I think it's a
    game,

    A game... now you're onto me.



    - If it's a secure application of choice that leaks /no/ information
    about the input list beyond the fact of the achievement of the
    lower-bound, respectively, on the number of words having each initial, I
    can tighten your specification in some of the ways I've queried,

    I sense your "tightening" will result in an unreadable spec, but it
    would be fun to try. So let's have it.



    - If it's a secure application of choice that may leak some information about the input list but may not leak any of the information of the
    original ordering of the words (which was implied to be the reason for
    the minimum number of queries for random numbers) then I can use fewer
    bits of entropy, saving runtime costs.

    Costco has a good deal on entropy right now.



    This 3rd possible endeavour is
    less relevant now that you allow a PRNG because I don't have to care
    about the cost of bits of entropy or their turnaround time. It's still
    an interesting one when considering the nature of the task of
    requirements engineering and agreeing requirements. Programmes can fail
    due to uncompetitiveness induced by individual member projects with unnecessary or insufficient requirements.

    More than that, though, queries for random numbers may come in
    individual bits and an implementation might query 16 numbers, for
    example, for each word choice, rather than one. And it goes deeper than
    that. That means the request to query for 2600 numbers is sort of
    meaningless and can lead to programme failure by being in the class of unnecessary itself and its presence leading to the other requirements
    being insufficient.

    I would agree that on a scale of necessary to unnecessary, this
    challenge lies very close to unnecessary.

    But it lies closer to the middle of the scale interesting..uninteresting.


    I have a few more up my sleeve. One in particular I've been thinking
    about, that explicitly disallows the use of a RNG.



    More still, people get gambling games wrong and go to jail because they
    learn and practice C programming without any awareness of the difficulty
    of "random" and they might read this newsgroup to shape their skills.

    You should consult an attorney - I wouldn't want you to do (rand() % 26)
    + 1 days in jail for reading clc and attempting my challenge.



    So you see, each of my questions were properly important for many reasons.

    I really did think about it carefully.

    I appreciate your consideration.



    ps you're nuts


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Tim Rentsch@3:633/10 to All on Tue Mar 24 12:39:46 2026
    scott@slp53.sl.home (Scott Lurndal) writes:

    DFS <nospam@dfs.com> writes:

    On 3/24/2026 3:43 AM, Tim Rentsch wrote:

    DFS <nospam@dfs.com> writes:

    On 3/22/2026 1:29 PM, John McCue wrote:

    DFS <nospam@dfs.com> wrote:
    <snip>

    ---------------------
    Word Source
    ---------------------
    There's a huge unsorted word list here:

    https://limewire.com/?referrer=pq7i8xx7p2

    ...which you can develop against.

    Do I need to create an ID to get the list ?

    I don't think so.

    It didn't give me an ID or login when I uploaded them.


    I just now uploaded it here: https://filebin.net/kkkyqw1ritefnw0f

    A fucking web page. How about a link to a plain text file
    that has just the words?

    Just fucking click on the fucking file name.

    Not everyone reads usenet with a browser or a news client
    that understands hypertext or the hypertext transfer protocol.

    I would generally have used 'wget' to fetch, so if you'd
    specified:

    https://filebin.net/kkkyqw1ritefnw0f/words_unsorted.txt

    That may have been slightly better, but it appears that
    filebin interposes a warning screen and forces a second
    click, so wget may also have failed.

    Thank you for this. It's nice to know someone here
    understands.

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bart@3:633/10 to All on Tue Mar 24 20:29:35 2026
    On 24/03/2026 17:10, DFS wrote:
    On 3/23/2026 12:03 PM, Bart wrote:
    On 23/03/2026 03:40, DFS wrote:
    On 3/22/2026 7:53 PM, Bart wrote:

    Not C, so that code is here: https://github.com/sal55/langs/blob/
    master/dfs.q

    slick.˙ It's a powerful scripting language.˙ Reading a text file in
    with one line is nice.˙ It's about 10 lines of C.

    Well, it can be one line in C too, once you create a function for it!


    Did you look to python for inspiration when creating it?

    No. I glanced at it but all I remember is that it was 58 lines.

    I don't mean my little bit of code.˙ I mean did you look to the python language for inspirations when you were developing your scripting language?

    In that case, no. Both started around 1990, but I didn't look at Python
    until a decade later.

    The only feature I borrowed and still use was an 'else' clause for
    for-loops. Plus I briefly had list-comps, but they fell into disuse. A
    few other ideas were tried such as generators.

    (See https://github.com/sal55/langs/blob/master/QLanguage/qbasics.md,
    this is from 5 years ago.

    Mine started as an add-on scripting language for my 3D graphics
    applications, so was more of a DSL. Then it became independent.)


    I decided to do it in my 'M' language first as there are fewer i's and
    t's to dot and cross when developing an algorithm.

    You have separate M and Q languages?

    M is my systems language (somewhat higher level than C), and Q is my
    scripting language (lower level than Python and less dynamic):

    C--M-----------Q------------------Python

    (If interested, my version is here; it's about 160 lines: https://
    github.com/sal55/langs/blob/master/dfs.m. I had planned to use C's
    qsort(), but that didn't seem to work, so it includes a sort routine.)

    This version produces the output in 0.30 seconds.


    Why wouldn't qsort() work?

    It just give the wrong results. Maybe I called it wrong, but I couldn't
    see how (I know the args to the compare function have an extra
    indirection level). Maybe I'll it again later.


    ---------------------
    [2]int seed = (0x2989'8811'1111'1272',0x1673'2673'7335'8264)

    export func mrandom:u64 =
    ˙˙˙˙ int x, y
    ˙˙˙˙ x := seed[1]
    ˙˙˙˙ y := seed[2]
    ˙˙˙˙ seed[1] := y
    ˙˙˙˙ x ixor:= x<<23
    ˙˙˙˙ seed[2] := x ixor y ixor x>>17 ixor y>>26
    ˙˙˙˙ return seed[2] + y
    end


    Do you have a C version of that?

    If so I'll run it against a RNG comparison program I wrote.

    Try this:

    typedef unsigned long long u64;

    u64 seed[2] = {0x2989881111111272ULL, 0x1673267373358264ULL};

    u64 crandom() {
    u64 x, y;
    x = seed[0];
    y = seed[1];
    seed[0] = y;
    x ^= x<<23;
    seed[1] = x ^ y ^ (x>>17) ^ (y>>26);
    return seed[1] + y;
    }


    I think this is just a '128-bit/xor/shift' method I saw somewhere online.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Tristan Wibberley@3:633/10 to All on Tue Mar 24 22:18:32 2026
    On 24/03/2026 16:03, DFS wrote:
    On 3/24/2026 10:02 AM, Scott Lurndal wrote:
    DFS <nospam@dfs.com> writes:
    On 3/24/2026 3:43 AM, Tim Rentsch wrote:
    DFS <nospam@dfs.com> writes:

    On 3/22/2026 1:29 PM, John McCue wrote:

    DFS <nospam@dfs.com> wrote:
    <snip>

    ---------------------
    Word Source
    ---------------------
    There's a huge unsorted word list here:

    https://limewire.com/?referrer=pq7i8xx7p2

    ...which you can develop against.

    Do I need to create an ID to get the list ?

    I don't think so.

    It didn't give me an ID or login when I uploaded them.


    I just now uploaded it here:˙ https://filebin.net/kkkyqw1ritefnw0f

    A fucking web page.˙ How about a link to a plain text file
    that has just the words?


    Just fucking click on the fucking file name.

    Not everyone reads usenet with a browser or a news client
    that understands hypertext or the hypertext transfer protocol.

    Sucks for them.


    I would generally have used 'wget' to fetch, so if you'd
    specified:

    https://filebin.net/kkkyqw1ritefnw0f/words_unsorted.txt

    That may have been slightly better, but it appears that
    filebin interposes a warning screen and forces a second
    click, so wget may also have failed.


    $wget -r -np https://filebin.net/kkkyqw1ritefnw0f/words_unsorted.txt


    HTTP is really the wrong URI scheme, it has various explicit provisions (providing implicit permission) for content to be modified and replaced
    in transit for various compatibility and resource-consumption goals of intervening hosts. Even HTTPS doesn't solve that problem because it's
    only a technical measure and only against illegal interposers. Legal interposers may be admitted by the technical measure's administrators.

    ftp, sftp, ftps, rsync... these are where it's at for file content
    rather than hypertext nodes. There's no implied right to substitute.

    --
    Tristan Wibberley

    The message body is Copyright (C) 2026 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except that you may,
    of course, cite it academically giving credit to me, distribute it
    verbatim as part of a usenet system or its archives, and use it to
    promote my greatness and general superiority without misrepresentation
    of my opinions other than my opinion of my greatness and general
    superiority which you _may_ misrepresent. You definitely MAY NOT train
    any production AI system with it but you may train experimental AI that
    will only be used for evaluation of the AI methods it implements.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Tristan Wibberley@3:633/10 to All on Wed Mar 25 00:29:44 2026
    On 24/03/2026 17:17, DFS wrote:
    On 3/24/2026 7:03 AM, Tristan Wibberley wrote:
    On 23/03/2026 04:06, DFS wrote:
    On 3/22/2026 8:05 PM, Tristan Wibberley wrote:
    On 22/03/2026 23:14, DFS wrote:
    On 3/22/2026 7:02 PM, Tristan Wibberley wrote:
    On 22/03/2026 14:38, DFS wrote:
    ---------------------
    Objective
    ---------------------
    deliver a C (and optional 2nd language) program that - from a large >>>>>>> list
    of unsorted words possibly containing duplicates - extracts 26
    sets of
    100 random and unique words that each begin with a letter of the >>>>>>> English
    alphabet.

    What random distribution, uniform?
    Said distribution over the unique words or said distribution over the >>>>>> original list?

    pseudorandom?


    I don't care about the uniformity of the distribution, as long as the >>>>> output is unique words, and you generate and use 2600+ random values >>>>> from a RNG.


    I think you're unaware that I can predictably generate a sequence of
    identical values when the distribution is free and your
    specification is
    satisfied by selecting with a distribution that prefers just one
    indicatory value for a choice of word to the exclusion of all others.


    Yeah, I don't really know what any of that means.˙ But it sounds like
    your 3rd attempt to sidestep the generation and use of 2600+ random
    values.

    I think you could show some interesting techniques, but you have to
    adhere to the requirements of the challenge.

    It's because of my deeper understanding of the meaning (or barely
    meaningfulness) of the word "random" and my awareness of how critical it
    is to many applications of randomness.

    That is:

    ˙ - If it's a game I could go ahead with a PRNG and satisfy you easily -
    but it's not interesting to me these days, at this point I think it's a
    game,

    A game... now you're onto me.



    ˙ - If it's a secure application of choice that leaks /no/ information
    about the input list beyond the fact of the achievement of the
    lower-bound, respectively, on the number of words having each initial, I
    can tighten your specification in some of the ways I've queried,

    I sense your "tightening" will result in an unreadable spec, but it
    would be fun to try.˙ So let's have it.

    I don't think it would be unreadable. It might require some thoughtto synthesise a program that satisfies it.

    But you've told me its just a game (I suppose "toy", rather than
    gambling game). So the interesting bit could be just like:

    "produce the output so that, within each of the groupings based on the
    initial letter, the words are superficially shuffled around even when
    they're not shuffled around in the input."


    I would agree that on a scale of necessary to unnecessary, this
    challenge lies very close to unnecessary.

    I didn't mean to suggest the challenge is unnecessary, but mean to
    discuss the problems of writing specifications and requirements such
    that they're not necessary to fulfil the goal. Requirements involving randomness and secrecy are particularly interesting in that respect.


    But it lies closer to the middle of the scale interesting..uninteresting.


    I have a few more up my sleeve.˙ One in particular I've been thinking
    about, that explicitly disallows the use of a RNG.



    More still, people get gambling games wrong and go to jail because they
    learn and practice C programming without any awareness of the difficulty
    of "random" and they might read this newsgroup to shape their skills.

    You should consult an attorney - I wouldn't want you to do (rand() % 26)
    + 1 days in jail for reading clc and attempting my challenge.

    You seem to be assuming every reader is just fulfilling a need for a
    passtime. I don't suppose that and you seem to be mocking me for it; I
    think that's awful. You'd got me really excited about the breadth of
    nuance in requirements and the effects of that and then turned it into
    an opportunity for mockery.


    ps you're nuts

    That may be, but how did you know?


    --
    Tristan Wibberley

    The message body is Copyright (C) 2026 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except that you may,
    of course, cite it academically giving credit to me, distribute it
    verbatim as part of a usenet system or its archives, and use it to
    promote my greatness and general superiority without misrepresentation
    of my opinions other than my opinion of my greatness and general
    superiority which you _may_ misrepresent. You definitely MAY NOT train
    any production AI system with it but you may train experimental AI that
    will only be used for evaluation of the AI methods it implements.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Eli the Bearded@3:633/10 to All on Wed Mar 25 04:07:52 2026
    In comp.lang.c, DFS <nospam@dfs.com> wrote:
    I just now uploaded it here: https://filebin.net/kkkyqw1ritefnw0f

    "word" list

    $ grep ^.$ ~/tmp/words-unsorted |grep -v [aeiouy] |wc
    19 19 38
    $ grep ^..$ ~/tmp/words-unsorted |grep -v [aeiouy] |wc
    54 54 162
    $ grep ^...$ ~/tmp/words-unsorted |grep -v [aeiouy] |wc
    74 74 296
    $ grep ^....$ ~/tmp/words-unsorted |grep -v [aeiouy] |wc
    13 13 65

    Elijah
    ------
    not going to use that for Scrabble





    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Wed Mar 25 01:17:59 2026
    On 3/25/2026 12:07 AM, Eli the Bearded wrote:
    In comp.lang.c, DFS <nospam@dfs.com> wrote:
    I just now uploaded it here: https://filebin.net/kkkyqw1ritefnw0f

    "word" list

    $ grep ^.$ ~/tmp/words-unsorted |grep -v [aeiouy] |wc
    19 19 38
    $ grep ^..$ ~/tmp/words-unsorted |grep -v [aeiouy] |wc
    54 54 162
    $ grep ^...$ ~/tmp/words-unsorted |grep -v [aeiouy] |wc
    74 74 296
    $ grep ^....$ ~/tmp/words-unsorted |grep -v [aeiouy] |wc
    13 13 65

    Elijah
    ------
    not going to use that for Scrabble

    $ ./wl words_unsorted.txt

    Summary of words_unsorted.txt ---------------------------------------------------------------------------------------------------------------------------------------------------------
    300398 words

    First word in sorted list is 'a'
    Last word in sorted list is 'zyzzogeton'
    Longest word is 28 letters

    Median word = mimosas

    Word count by length
    1. 25 2. 212 3. 1515 4. 5909 5. 16165 6. 22031
    7. 31232 8. 38900 9. 41592 10. 39423 11. 32745 12. 25346
    13. 18134 14. 11813 15. 7134 16. 4019 17. 2185 18. 1032
    19. 534 20. 264 21. 106 22. 51 23. 21 24. 7
    25. 1 26. 0 27. 1 28. 1


    Word count by first letter
    a. 20485 b. 13991 c. 25594 d. 14981 e. 11522 f. 9405
    g. 9008
    h. 11505 i. 11163 j. 2143 k. 2953 l. 8200 m. 16709
    n. 8430
    o. 9771 p. 29745 q. 1474 r. 14084 s. 32516 t. 16091
    u. 18167
    v. 4566 w. 5382 x. 446 y. 919 z. 1148

    Letter frequency (total letters = 2852034)
    e. 305117 (10.7%) i. 257590 ( 9.0%) a. 243274 ( 8.5%) o. 208453
    ( 7.3%)
    r. 204403 ( 7.2%) s. 203387 ( 7.1%) n. 198749 ( 7.0%) t. 191543
    ( 6.7%)
    l. 160027 ( 5.6%) c. 125678 ( 4.4%) u. 105257 ( 3.7%) p. 94327
    ( 3.3%)
    d. 90187 ( 3.2%) m. 87628 ( 3.1%) h. 76769 ( 2.7%) g. 64686
    ( 2.3%)
    y. 58025 ( 2.0%) b. 50478 ( 1.8%) f. 31172 ( 1.1%) v. 25817
    ( 0.9%)
    k. 21030 ( 0.7%) w. 18053 ( 0.6%) z. 12976 ( 0.5%) x. 8486
    ( 0.3%)
    q. 4724 ( 0.2%) j. 4080 ( 0.1%)

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bart@3:633/10 to All on Wed Mar 25 11:54:39 2026
    On 24/03/2026 17:06, DFS wrote:
    On 3/23/2026 6:26 PM, Bart wrote:

    It looks very clunky but seems to do the job, and not too slowly
    either (see below).

    Years ago I was shocked how fast C chewed thru text data (and it's even faster dealing with numbers).

    Actually, I'm still shocked.˙ I wrote an anagram program in C that used prime factors to do searches, and it found 5 anagrams from a list of
    370K words in 0.0055s (5.5/1000ths of a second).


    You're attributing too much to C. Or maybe comparing it too much to
    Python which is very slow.

    There are other factors: hardware today is incredibly fast (like 4
    magnitudes or more faster than the 8-bit machines I started off on).

    And a lot of it is due to the optimising compilers now available.

    My own systems language is aso quite low-level. And can be just as fast
    if someone were to write an optimising compiler for it too!

    (As it is, it's not far off. Its self-hosted compiler can build over 20
    new generations of itself per second, on a machine slower than yours.)



    And it would be even faster with the use of a hash table.˙ Incredible.

    And that's on my low-end AMD Ryzen 5600G (16GB DDR4-3200 RAM)

    If that's low-end, what would be high-end? I mean in desktop computer
    terms not some supercomputer.


    No extra copy of the list is necessary to find duplicates (but for
    one- > pass efficiency, sorting the list is required).

    Look at the first letter of each duplicate.

    "congratulations on the wherewithal youngun"
    cotwy

    Sort the file and the dupes are already sorted.˙ That was intentional.

    If that explanation lets you drop some lines, good.

    I'm now down to 150 sloc for the C version, and 125 sloc for the M version.



    My method of finding the 26 sets was to:

    1) count words by letter as the file is read in lettercnt[wordsin[i][0]-'a']++;

    (I saw something similar in your scripting, but couldn't spot it in
    your .c)

    It's probably this line:

    ++nbig[(unsigned char)buffer[0]]

    The cast is because 'char' is signed and could be negative.

    Note that my arrays can have arbitrary lower bounds (this is a rare
    feature among HLLS), and here start from 'a'.



    2) sort the data just read in
    qsort(wordsin, wordcnt, sizeof(char*), comparechar);


    3) using the lettercnt[] array from step 1, determine the start-end positions of each set of words beginning with a..z

    Letter˙˙ Start˙˙ End
    a˙˙˙˙˙˙˙˙˙˙ 0˙˙ 20484
    b˙˙˙˙˙˙ 20485˙˙ 34475
    c˙˙˙˙˙˙ 34476˙˙ 60069
    d˙˙˙˙˙˙ 60070˙˙ 75050
    e˙˙˙˙˙˙ 75051˙˙ 86572
    f˙˙˙˙˙˙ 86573˙˙ 95977
    g˙˙˙˙˙˙ 95978˙ 104985
    h˙˙˙˙˙ 104986˙ 116490
    i˙˙˙˙˙ 116491˙ 127653
    j˙˙˙˙˙ 127654˙ 129796
    k˙˙˙˙˙ 129797˙ 132749
    l˙˙˙˙˙ 132750˙ 140949
    m˙˙˙˙˙ 140950˙ 157658
    n˙˙˙˙˙ 157659˙ 166088
    o˙˙˙˙˙ 166089˙ 175859
    p˙˙˙˙˙ 175860˙ 205604
    q˙˙˙˙˙ 205605˙ 207078
    r˙˙˙˙˙ 207079˙ 221162
    s˙˙˙˙˙ 221163˙ 253678
    t˙˙˙˙˙ 253679˙ 269769
    u˙˙˙˙˙ 269770˙ 287936
    v˙˙˙˙˙ 287937˙ 292502
    w˙˙˙˙˙ 292503˙ 297884
    x˙˙˙˙˙ 297885˙ 298330
    y˙˙˙˙˙ 298331˙ 299249
    z˙˙˙˙˙ 299250˙ 300397


    4) generate 100+ random numbers between start and end of each letter
    int r = (rand() % (end - start + 1)) + start;

    This 'calculation of start and end' for each letter is what I thought to
    be a novel approach.

    I'm curious how others will approach it (if anyone else tries).

    I don't understand what's going on there. If there are N words in total
    that start with 'c', say, then I just generate a random number from 0 to
    N-1 (C), or 1 to N (M).


    Altogether my program makes:
    3 passes thru the 300398 words in:
    ˙ * 1 to count total words and words by letter
    ˙ * 1 to load the words into an array
    ˙ * 1 to find duplicates

    2 passes thru the 2600 words out:
    ˙ * 1 to verify the 100 words per letter
    ˙ * 1 to print all 2600 words

    5 total passes?˙ Not sure that's Ivy League.˙ But everything runs in
    1/10th of a second so I can't complain.

    In 0.25 seconds on my machine! This is why it can better to not use the fastest machine around, then you can spot inefficiencies more easily.

    That was Windows; on WSL it was a little slower: 0.4 seconds 'real' time.

    If you have a short challenge of medium difficulty, post it so we can
    learn and improve skillz.

    I tried the same program on an unsorted list 10 times the size. That is,
    just duplicating everything to get a 3,003,980-line file.

    Generally programs still worked, but took longer, and the list of
    duplicates was a bit bigger!

    The Python version took 4.2s or 5s on PyPy. My Q version got much slower
    at 14s (maybe the interpreted sort is the reason).

    Your C version was 4.5s. Mine are 3.x but they cap the duplicates at 100
    so they can't be compared.

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Richard Harnden@3:633/10 to All on Wed Mar 25 12:32:06 2026
    On 22/03/2026 14:38, DFS wrote:
    ---------------------
    Objective
    ---------------------
    deliver a C (and optional 2nd language) program that - from a large list
    of unsorted words possibly containing duplicates - extracts 26 sets of
    100 random and unique words that each begin with a letter of the English alphabet.


    ---------------------
    Outputs
    ---------------------

    1) count of words by letter

    Letter˙˙ Words In˙˙ Words Out
    a˙˙˙˙˙˙˙˙˙ 2345˙˙˙˙˙˙ 100
    b˙˙˙˙˙˙˙˙˙ 4399˙˙˙˙˙˙ 100
    c˙˙˙˙˙˙˙˙˙˙ 844˙˙˙˙˙˙ 100
    ...
    z˙˙˙˙˙˙˙˙˙ 1011˙˙˙˙˙˙ 100


    2) identify duplicate words in the input file (if any) and print
    ˙˙ them sorted and using proper case on one line.

    ˙˙ found:˙ eventually dupes you get
    ˙˙ output: Dupes Eventually Get You


    3) print the 2600 words you identify in column x row order in a grid of
    ˙˙ size (200rows x 13cols or 300x9 or 400x7 or 500x6 or 600x4 etc)
    ˙˙ without hard-coding each column in a long printf.˙ They must be in
    ˙˙ alpha order.˙ If you participated in the 'sort of trivial challenge'
    ˙˙ a few weeks ago, you'll recognize this requirement.


    2600 unique random words (1000 rows x 3 columns)
    ˙˙ 1.˙ aardwolves˙˙˙˙˙˙˙˙˙ kafirin˙˙˙˙˙˙˙˙˙˙˙˙ uberous
    ˙˙ 2.˙ abaze˙˙˙˙˙˙˙˙˙˙˙˙˙˙ kafiz˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ulnae
    ˙˙ 3.˙ abitibi˙˙˙˙˙˙˙˙˙˙˙˙ kala˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ulnare
    ˙ ...
    ˙599.˙ funned˙˙˙˙˙˙˙˙˙˙˙˙˙ pyrone˙˙˙˙˙˙˙˙˙˙˙˙˙ zymophosphate
    ˙600.˙ fusan˙˙˙˙˙˙˙˙˙˙˙˙˙˙ pythiacystis˙˙˙˙˙˙˙ zymotic
    ˙601.˙ gable˙˙˙˙˙˙˙˙˙˙˙˙˙˙ qanat
    ˙602.˙ gade˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ qere
    ˙˙ ...
    ˙998.˙ juvia˙˙˙˙˙˙˙˙˙˙˙˙˙˙ typedefs
    ˙999.˙ juxtaposition˙˙˙˙˙˙ tyrannizings
    1000.˙ jynx˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ tzaddikim



    ---------------------
    Requirement
    ---------------------
    You must call a RNG 2600+ times to build the list (ie you can't use the random ordering of the input file to your advantage).˙ In repeated runs,
    my C and python solutions called the RNG 2635x to 2675x (because of duplicate randoms).



    ---------------------
    Word Source
    ---------------------
    There's a huge unsorted word list here:

    https://limewire.com/?referrer=pq7i8xx7p2

    ...which you can develop against.


    My C and python solutions are shown below, and at the same link.


    No code perusal until you submit yours!

    Enjoy!















































































    ========================================================================
    C˙ 125 LOC
    On my WSL system this C runs in 0.095 seconds using the unsorted words file ========================================================================

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <time.h>
    #include <ctype.h>˙ //for tolower and toupper

    //example usage = $./2600words words_unsorted.txt 500 6

    //string compare function for qsort
    int comparechar( const void *a, const void *b){
    ˙˙˙˙const char **chara = (const char **)a;
    ˙˙˙ const char **charb = (const char **)b;
    ˙˙˙ return strcmp(*chara, *charb);
    }

    int main(int argc, char *argv[]) {

    ˙˙˙˙//validations
    ˙˙˙˙if ((argc) < 4) {
    ˙˙˙˙˙˙˙ printf("Invalid input \nEnter program-name˙ word-file˙ rows columns\n");
    ˙˙˙˙˙˙˙ printf("example: $./2600words words.txt 400˙ 7\n\n");
    ˙˙˙˙˙˙˙ exit(0);
    ˙˙˙˙}

    ˙˙˙˙if (atoi(argv[2]) * atoi(argv[3]) < 2600) {
    ˙˙˙˙˙˙˙ printf("Invalid input: enter rows * columns that total 2600+ \n\n");
    ˙˙˙˙˙˙˙ exit(0);
    ˙˙˙˙}

    ˙˙˙˙int˙ i = 0, t = 0, wout = 0;˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ // counters
    ˙˙˙˙int˙ lettercnt[26] = {0};˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ //hold count of words by first letter
    ˙˙˙˙int˙ maxwordlen = 0;˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ //
    length of longest word in list
    ˙˙˙˙int˙ start = 0, end = 0;˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ //used
    to extract 100 words per letter
    ˙˙˙˙int˙ temp[100] = {0};˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ //
    holds the 100 random words for the letter
    ˙˙˙˙int˙ wordcnt = 0, totwords = 0;˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ //
    used to extract 100 words per letter
    ˙˙˙˙char line[35] = "";˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ // buffer to hold line when reading file
    ˙˙˙˙char therand[9];˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ //the current random value
    ˙˙˙˙char usedlist[1000];˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ //
    stores the random numbers already used

    ˙˙˙˙// ===========================================================================
    ˙˙˙˙//nitty gritty - read in the unsorted words
    ˙˙˙˙// ===========================================================================
    ˙˙˙˙FILE *fin = fopen(argv[1],"r");˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ //
    open file
    ˙˙˙˙while (fgets(line,sizeof line, fin)!= NULL) {˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ //
    count lines = words, get max word length
    ˙˙˙˙˙˙˙ wordcnt++;
    ˙˙˙˙˙˙˙ if (strlen(line) > maxwordlen) {
    ˙˙˙˙˙˙˙˙˙˙˙ maxwordlen = strlen(line);
    ˙˙˙˙˙˙˙ }
    ˙˙˙˙}
    ˙˙˙˙char theword[maxwordlen+1];
    ˙˙˙˙rewind(fin);˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ //
    pointer back to beginning
    ˙˙˙˙char **wordsin˙ = malloc(sizeof(char*) * wordcnt);˙˙˙˙˙˙˙˙˙˙˙ // allocate memory
    ˙˙˙˙while (fgets(theword,sizeof theword, fin) != NULL) {˙˙˙˙˙˙˙ //read
    line into buffer
    ˙˙˙˙˙˙˙ int wordlen = strlen(theword);˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ //get length of word
    ˙˙˙˙˙˙˙ wordsin[i] = malloc(wordlen + 1);˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ // allocate memory for the word
    ˙˙˙˙˙˙˙ strncpy(wordsin[i], theword, wordlen);˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ //
    copy word into array
    ˙˙˙˙˙˙˙ wordsin[i][wordlen-1] = '\0';˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ //add terminator - overwrites the \n in the file
    ˙˙˙˙˙˙˙ lettercnt[wordsin[i][0]-'a']++;˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ // update count of words by first letter
    ˙˙˙˙˙˙˙ i++;˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ // increment counter
    ˙˙˙˙}
    ˙˙˙˙fclose(fin);˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ //close handle to file

    ˙˙˙˙// ===========================================================================
    ˙˙˙˙//fun begins
    ˙˙˙˙// ===========================================================================
    ˙˙˙˙//sort master list of words
    ˙˙˙˙//for each letter, determine the start and end positions of words beginning with that letter
    ˙˙˙˙//generate random numbers between that start and end
    ˙˙˙˙//check if that random number is in the usedlist array.˙ If not,
    add it to usedlist and temp arrays
    ˙˙˙˙//when temp array has 100 unique randoms in it, add them to the
    master array, break and go to next letter
    ˙˙˙˙//do one sort at the end
    ˙˙˙˙qsort(wordsin, wordcnt, sizeof(char*), comparechar);˙˙˙˙˙˙˙ //sort
    the master
    ˙˙˙˙char **wordsout = malloc(sizeof(char*) * 2600);˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ //
    final output goes into this array
    ˙˙˙˙srand(time(NULL));
    ˙˙˙˙for (i = 0; i < 26; i++) {˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ //
    find start-end for each letter set
    ˙˙˙˙˙˙˙ start = (totwords += lettercnt[i]) - lettercnt[i];
    ˙˙˙˙˙˙˙ end = start + lettercnt[i] - 1;
    ˙˙˙˙˙˙˙ memset(usedlist, 0, sizeof(usedlist));
    ˙˙˙˙˙˙˙ memset(temp,˙˙˙˙ 0, 100);
    ˙˙˙˙˙˙˙ t = 0;
    ˙˙˙˙˙˙˙ for (int j = 0; j < 200; j++) {
    ˙˙˙˙˙˙˙˙˙˙˙ int r = (rand() % (end - start + 1)) + start;
    ˙˙˙˙˙˙˙˙˙˙˙ sprintf(therand," %d ", r);
    ˙˙˙˙˙˙˙˙˙˙˙ if (strstr(usedlist, therand) == NULL) {
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ strncat(usedlist, therand, strlen(therand));
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ temp[t++] = r;
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ if (t > 100) {
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ for (int j = 0; j < 100; j++) {
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ sprintf(theword, "%s", wordsin[temp[j]]);
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ int wordlen = strlen(theword);
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ wordsout[wout] = malloc(wordlen + 1);
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ strncpy(wordsout[wout], theword, wordlen);
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ wordsout[wout++][wordlen] = '\0';
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ }
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ break;
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ }
    ˙˙˙˙˙˙˙˙˙˙˙ }
    ˙˙˙˙˙˙˙ }
    ˙˙˙˙}
    ˙˙˙˙qsort(wordsout, wout, sizeof(char*), comparechar);˙˙˙ //final sort
    of 2600 words


    ˙˙˙˙// ===================================================================================================
    ˙˙˙˙//final output: print word counts by letter, print dupes, print
    random words by column then row
    ˙˙˙˙// ===================================================================================================
    ˙˙˙˙printf("%d words loaded\n",wordcnt);
    ˙˙˙˙if(wout == 2600) {
    ˙˙˙˙˙˙˙ printf("list of 2600 unique random words created\n");
    ˙˙˙˙˙˙˙ printf("\nLetter˙˙ Words In˙˙ Words Out\n");
    ˙˙˙˙˙˙˙ t = 0;
    ˙˙˙˙˙˙˙ for (i = 0; i < 26; i ++) {
    ˙˙˙˙˙˙˙˙˙˙˙ t = 0;
    ˙˙˙˙˙˙˙˙˙˙˙ for (int j = 0; j < wout; j++) {
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ if (wordsout[j][0] == (i + 97)) {t++;}
    ˙˙˙˙˙˙˙˙˙˙˙ }
    ˙˙˙˙˙˙˙˙˙˙˙ printf("˙ %2c˙˙˙ %6d˙˙˙˙˙˙˙ %d\n", i+97, lettercnt[i], t);
    ˙˙˙˙˙˙˙ }

    ˙˙˙˙} else {
    ˙˙˙˙˙˙˙ printf("Errors occurred.˙ 2600 words not produced.\n");
    ˙˙˙˙˙˙˙ exit(0);
    ˙˙˙˙}

    ˙˙˙˙//duplicate words
    ˙˙˙˙printf("\nDuplicate words in proper case\n");
    ˙˙˙˙for (i = 0; i < wordcnt-1; i++) {
    ˙˙˙˙˙˙˙ if (strcmp(wordsin[i], wordsin[i+1]) == 0) {
    ˙˙˙˙˙˙˙˙˙˙˙ sprintf(theword, "%s", wordsin[i]);
    ˙˙˙˙˙˙˙˙˙˙˙ for (int i = 0; theword[i] != '\0'; i++) {
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ if (i == 0) {theword[i] = toupper(theword[i]);}
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ if (i >˙ 0) {theword[i] = tolower(theword[i]);}
    ˙˙˙˙˙˙˙˙˙˙˙ }
    ˙˙˙˙˙˙˙˙˙˙˙ printf("%s ", theword);
    ˙˙˙˙˙˙˙ }
    ˙˙˙˙}

    ˙˙˙˙//print random words in column then row order
    ˙˙˙˙int rows = atoi(argv[2]), cols = atoi(argv[3]), colwidth = 20;
    ˙˙˙˙printf("\n\n2600 unique random words (%d rows x %d columns)\n",
    rows, cols);
    ˙˙˙˙for (int r = 1; r <= rows; r++) {
    ˙˙˙˙˙˙˙ if (r <= wout) {
    ˙˙˙˙˙˙˙˙˙˙˙ int nbr = r;
    ˙˙˙˙˙˙˙˙˙˙˙ printf("%3d. %-*s", r, colwidth, wordsout[nbr-1]);
    ˙˙˙˙˙˙˙˙˙˙˙ for (int i = 0; i < cols-1; i++) {
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ nbr += rows;
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ if (nbr <= wout) {
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ printf("%-*s", colwidth, wordsout[nbr-1]);
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ }
    ˙˙˙˙˙˙˙˙˙˙˙ }
    ˙˙˙˙˙˙˙˙˙˙˙ printf("\n");
    ˙˙˙˙˙˙˙ }
    ˙˙˙˙}

    ˙˙˙˙//finito
    ˙˙˙˙free(wordsin);
    ˙˙˙˙free(wordsout);

    ˙˙˙˙return 0;
    }

    ========================================================================












    ======================================================================== python˙ 58 LOC
    On my WSL system this python runs in 1.05 seconds using the unsorted
    words file ========================================================================

    import sys, random

    if len(sys.argv) < 4:
    ˙˙˙˙print("Invalid input \nEnter program name word-file rows columns")
    ˙˙˙˙print("example: $ python3˙ 2600words.py˙ words.txt˙ 400˙ 7")
    ˙˙˙˙exit()

    if (int(sys.argv[2]) * int(sys.argv[3])) < 2600:
    ˙˙˙˙print("Invalid input: enter rows * columns that total 2600+")
    ˙˙˙˙exit()

    #read unsorted words file, generate 100 randoms per letter
    from collections import Counter
    wordsout, used, temp = [],[],[]
    lettercnt = [0]*26
    with open(sys.argv[1],'r') as f:
    ˙˙˙˙wordsin = f.readlines()
    ˙˙˙˙for line in wordsin:
    ˙˙˙˙˙˙˙ lettercnt[ord(line[0])-97] += 1
    ˙˙˙˙print("%d words loaded" % (len(wordsin)))
    ˙˙˙˙wordsuni = sorted(set(wordsin))
    ˙˙˙˙for letter in 'abcdefghijklmnopqrstuvwxyz':
    ˙˙˙˙˙˙˙ used.clear()
    ˙˙˙˙˙˙˙ temp.clear()
    ˙˙˙˙˙˙˙ lwords = [ line for line in wordsuni if line[0] == letter ]
    ˙˙˙˙˙˙˙ lenwordset = len(lwords)
    ˙˙˙˙˙˙˙ for i in range(200):
    ˙˙˙˙˙˙˙˙˙˙˙ randword = lwords[random.randint(0, lenwordset-1)]
    ˙˙˙˙˙˙˙˙˙˙˙ if randword not in used:
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ temp.append(randword.rstrip())
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ used.append(randword)
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ if len(temp) == 100:
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ wordsout += sorted(temp)
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ break
    print('list of ' + str(len(wordsout)) + ' unique random words created')

    #words out should always be 100 per letter
    print("\nLetter˙˙ Words In˙˙ Words Out")
    for i in range(26):
    ˙˙˙˙wout = 0
    ˙˙˙˙for j in range(len(wordsout)):
    ˙˙˙˙˙˙˙ if (ord(wordsout[j][0]) == (i + 97)):
    ˙˙˙˙˙˙˙˙˙˙˙ wout += 1
    ˙˙˙˙print("˙ %2c˙˙˙ %6d˙˙˙˙˙˙˙ %d" % (i+97, lettercnt[i], wout))

    #find duplicate words
    print("\nDuplicate words in proper case")
    counts = Counter(wordsin)
    dupes˙ = [item for item, count in counts.items() if count > 1]
    if len(dupes) > 0:
    ˙˙˙˙for dupe in sorted(dupes):
    ˙˙˙˙˙˙˙ print(dupe.strip().title(), end = ' ')
    else:
    ˙˙˙˙print("no duplicate words")

    #print randoms by col then row
    rows, cols = int(sys.argv[2]), int(sys.argv[3])
    colwidth, words = 20, len(wordsout)
    print("\n\n2600 unique random words (%d rows x %d columns)" % (rows, cols)) for r in range(1,rows + 1):
    ˙˙˙˙if r <= words:
    ˙˙˙˙˙˙˙ nbr = r
    ˙˙˙˙˙˙˙ print("%3d. %-*s" % (nbr, colwidth, wordsout[nbr-1]), end = ' ')
    ˙˙˙˙˙˙˙ for i in range(cols-1):
    ˙˙˙˙˙˙˙˙˙˙˙ nbr += rows
    ˙˙˙˙˙˙˙˙˙˙˙ if nbr <= words:
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ print("%-*s" % (colwidth, wordsout[nbr-1]), end = ' ')
    ˙˙˙˙˙˙˙ print()

    ====================================================================


    In shell ...

    ----
    #!/bin/ksh

    sort words_unsorted.txt >words.srt
    uniq words.srt >words.unq

    cat <<-X
    There are $(wc -l words.srt |\
    cut -d\ -f1) words in words_unsorted.txt
    and $(wc -l words.unq | cut -d\ -f1) are unique.

    Duplicates are: $(diff words.srt words.unq |\
    grep "<" | cut -d\ -f2 | sed "s/^\(.\)/\u\1/g" |\
    sort | tr '\n' ' ')

    Counts:
    $(
    for X in {a..z}
    do
    echo -n "$X " ; grep -c ^$X words.unq
    done
    )

    Samples ...
    X

    for X in {a..z}
    do
    grep ^$X words.unq | shuf -n 100 | sed "s/^\(.\)/\u\1/g"
    done >words.tmp

    head -1000 words.tmp >words.0
    head -2000 words.tmp | tail -1000 >words.1
    tail -600 words.tmp > words.2

    paste words.0 words.1 words.2 | nl -w4 -s". "

    rm words.srt words.unq words.0 words.1 words.2

    return 0
    ----

    eg:
    There are 300398 words in words_unsorted.txt
    and 300393 are unique.

    Duplicates are: Congratulations On The Wherewithal Youngun

    Counts:
    a 20485
    b 13991
    c 25593
    d 14981
    [...]
    w 5381
    x 446
    y 918
    z 1148

    Samples ...
    1. Apterygote Kleptistic Untransmuted
    2. Acetylcyanide Kekotene Uninfluencing
    3. Adglutinate Kopek Unaugmented
    [...]
    598. Folkvangr Putrilaginously Zootic
    599. Flebile Proboscides Zohak
    600. Fugitivism Paralysis Zonaria
    601. Gator Qualifications
    602. Gongoristic Qualityless
    [...]
    998. Juttied Thankful
    999. Jarry Trentine
    1000. Jonglery Towlines

    time says:
    real 0m00.60s
    user 0m00.89s
    sys 0m00.15s

    ... which is fast enough for me.

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bart@3:633/10 to All on Wed Mar 25 18:07:25 2026
    On 25/03/2026 11:54, Bart wrote:
    On 24/03/2026 17:06, DFS wrote:
    On 3/23/2026 6:26 PM, Bart wrote:

    1) count words by letter as the file is read in
    lettercnt[wordsin[i][0]-'a']++;

    (I saw something similar in your scripting, but couldn't spot it in
    your .c)

    It's probably this line:

    ˙˙˙˙˙˙˙˙˙ ++nbig[(unsigned char)buffer[0]]

    The cast is because 'char' is signed and could be negative.

    Note that my arrays can have arbitrary lower bounds (this is a rare
    feature among HLLS), and here start from 'a'.

    I forgot this was from the C version. While my non-C has those arrays
    with bounds from 'a' to 'z', when rewriting as C, I chose to have the
    bounds from 0 to 'z' inclusive (so the dimension has to be 'z'+1).

    This is a little wasteful, but that's only some 200 extra elements in
    the program. The advantage is not have to apply offsets when indexing.



    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Wed Mar 25 23:21:26 2026
    On 3/25/2026 8:32 AM, Richard Harnden wrote:

    In shell ...

    ----
    #!/bin/ksh

    sort words_unsorted.txt >words.srt
    uniq words.srt >words.unq

    cat <<-X
    ˙˙˙ There are $(wc -l words.srt |\
    ˙˙˙˙˙˙˙ cut -d\˙ -f1) words in words_unsorted.txt
    ˙˙˙ and $(wc -l words.unq | cut -d\˙ -f1) are unique.

    ˙˙˙ Duplicates are: $(diff words.srt words.unq |\
    ˙˙˙˙˙˙˙ grep "<" | cut -d\˙ -f2 | sed "s/^\(.\)/\u\1/g" |\
    ˙˙˙˙˙˙˙ sort | tr '\n' ' ')

    ˙˙˙ Counts:
    ˙˙˙ $(
    ˙˙˙˙˙˙˙ for X in {a..z}
    ˙˙˙˙˙˙˙ do
    ˙˙˙˙˙˙˙˙˙˙˙ echo -n "$X " ; grep -c ^$X words.unq
    ˙˙˙˙˙˙˙ done
    ˙˙˙ )

    ˙˙˙ Samples ...
    X

    for X in {a..z}
    do
    ˙˙˙ grep ^$X words.unq | shuf -n 100 | sed "s/^\(.\)/\u\1/g"
    done >words.tmp

    head -1000 words.tmp >words.0
    head -2000 words.tmp | tail -1000 >words.1
    tail -600 words.tmp > words.2

    paste words.0 words.1 words.2 | nl -w4 -s". "

    rm words.srt words.unq words.0 words.1 words.2

    return 0


    29 lines. Speed is good. Very nice. Probably took you no more than an
    hour to write.

    How do I run it? I tried this in Windows Subsystem for Linux:

    $ sudo ksh harnden.sh
    : not found2]:
    uniq: words.srt: No such file or directory
    : not found5]:
    : not found6]:
    wc: words.srt: No such file or directory



    Unfortunately...

    * the output doesn't meet the requirements: the words are unique and
    sorted, but they're not each randomly chosen by a RNG().

    * you hard-coded your output logic: 3 groups of words in 3 columns.

    * you didn't offer a C solution.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Wed Mar 25 23:38:39 2026
    On 3/24/2026 8:29 PM, Tristan Wibberley wrote:
    On 24/03/2026 17:17, DFS wrote:
    On 3/24/2026 7:03 AM, Tristan Wibberley wrote:
    On 23/03/2026 04:06, DFS wrote:
    On 3/22/2026 8:05 PM, Tristan Wibberley wrote:
    On 22/03/2026 23:14, DFS wrote:
    On 3/22/2026 7:02 PM, Tristan Wibberley wrote:
    On 22/03/2026 14:38, DFS wrote:
    ---------------------
    Objective
    ---------------------
    deliver a C (and optional 2nd language) program that - from a large >>>>>>>> list
    of unsorted words possibly containing duplicates - extracts 26 >>>>>>>> sets of
    100 random and unique words that each begin with a letter of the >>>>>>>> English
    alphabet.

    What random distribution, uniform?
    Said distribution over the unique words or said distribution over the >>>>>>> original list?

    pseudorandom?


    I don't care about the uniformity of the distribution, as long as the >>>>>> output is unique words, and you generate and use 2600+ random values >>>>>> from a RNG.


    I think you're unaware that I can predictably generate a sequence of >>>>> identical values when the distribution is free and your
    specification is
    satisfied by selecting with a distribution that prefers just one
    indicatory value for a choice of word to the exclusion of all others. >>>>

    Yeah, I don't really know what any of that means.˙ But it sounds like
    your 3rd attempt to sidestep the generation and use of 2600+ random
    values.

    I think you could show some interesting techniques, but you have to
    adhere to the requirements of the challenge.

    It's because of my deeper understanding of the meaning (or barely
    meaningfulness) of the word "random" and my awareness of how critical it >>> is to many applications of randomness.

    That is:

    ˙ - If it's a game I could go ahead with a PRNG and satisfy you easily - >>> but it's not interesting to me these days, at this point I think it's a
    game,

    A game... now you're onto me.



    ˙ - If it's a secure application of choice that leaks /no/ information
    about the input list beyond the fact of the achievement of the
    lower-bound, respectively, on the number of words having each initial, I >>> can tighten your specification in some of the ways I've queried,

    I sense your "tightening" will result in an unreadable spec, but it
    would be fun to try.˙ So let's have it.

    I don't think it would be unreadable. It might require some thoughtto synthesise a program that satisfies it.

    But you've told me its just a game (I suppose "toy", rather than
    gambling game). So the interesting bit could be just like:

    "produce the output so that, within each of the groupings based on the initial letter, the words are superficially shuffled around even when
    they're not shuffled around in the input."


    So a sorted input list is loaded, then the groupings by letter are
    shuffled "superficially".

    What constitutes a superficial shuffle?

    And what method do you propose to select 100 unique words from that superficially shuffled set of words?


    >> I would agree that on a scale of necessary to unnecessary, this
    challenge lies very close to unnecessary.

    I didn't mean to suggest the challenge is unnecessary, but mean to
    discuss the problems of writing specifications and requirements such
    that they're not necessary to fulfil the goal. Requirements involving randomness and secrecy are particularly interesting in that respect.


    But it lies closer to the middle of the scale interesting..uninteresting.


    I have a few more up my sleeve.˙ One in particular I've been thinking
    about, that explicitly disallows the use of a RNG.



    More still, people get gambling games wrong and go to jail because they
    learn and practice C programming without any awareness of the difficulty >>> of "random" and they might read this newsgroup to shape their skills.

    You should consult an attorney - I wouldn't want you to do (rand() % 26)
    + 1 days in jail for reading clc and attempting my challenge.

    You seem to be assuming every reader is just fulfilling a need for a passtime. I don't suppose that and you seem to be mocking me for it; I
    think that's awful. You'd got me really excited about the breadth of
    nuance in requirements and the effects of that and then turned it into
    an opportunity for mockery.

    So you were serious before now?


    ps you're nuts

    That may be, but how did you know?

    Your first reply: "What random distribution, uniform?"



    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Richard Harnden@3:633/10 to All on Thu Mar 26 04:02:04 2026
    On 26/03/2026 03:21, DFS wrote:
    On 3/25/2026 8:32 AM, Richard Harnden wrote:

    In shell ...

    ----
    #!/bin/ksh

    sort words_unsorted.txt >words.srt
    uniq words.srt >words.unq

    cat <<-X
    ˙˙˙˙ There are $(wc -l words.srt |\
    ˙˙˙˙˙˙˙˙ cut -d\˙ -f1) words in words_unsorted.txt
    ˙˙˙˙ and $(wc -l words.unq | cut -d\˙ -f1) are unique.

    ˙˙˙˙ Duplicates are: $(diff words.srt words.unq |\
    ˙˙˙˙˙˙˙˙ grep "<" | cut -d\˙ -f2 | sed "s/^\(.\)/\u\1/g" |\
    ˙˙˙˙˙˙˙˙ sort | tr '\n' ' ')

    ˙˙˙˙ Counts:
    ˙˙˙˙ $(
    ˙˙˙˙˙˙˙˙ for X in {a..z}
    ˙˙˙˙˙˙˙˙ do
    ˙˙˙˙˙˙˙˙˙˙˙˙ echo -n "$X " ; grep -c ^$X words.unq
    ˙˙˙˙˙˙˙˙ done
    ˙˙˙˙ )

    ˙˙˙˙ Samples ...
    X

    for X in {a..z}
    do
    ˙˙˙˙ grep ^$X words.unq | shuf -n 100 | sed "s/^\(.\)/\u\1/g"
    done >words.tmp

    head -1000 words.tmp >words.0
    head -2000 words.tmp | tail -1000 >words.1
    tail -600 words.tmp > words.2

    paste words.0 words.1 words.2 | nl -w4 -s". "

    rm words.srt words.unq words.0 words.1 words.2

    return 0


    29 lines.˙ Speed is good.˙ Very nice.˙ Probably took you no more than an hour to write.

    How do I run it?˙ I tried this in Windows Subsystem for Linux:

    $ sudo ksh harnden.sh
    : not found2]:
    uniq: words.srt: No such file or directory
    : not found5]:
    : not found6]:
    wc: words.srt: No such file or directory

    Make sure that words_unsorted.txt is in the same directory,
    that harnden.sh is executable,
    then: ./harnden.sh

    No need for sudo.




    Unfortunately...

    * the output doesn't meet the requirements: the words are unique and
    ˙ sorted, but they're not each randomly chosen by a RNG().

    shuf(1) will call rand(3)


    * you hard-coded your output logic: 3 groups of words in 3 columns.

    True, but it satisfies your "2600 unique random words (1000 rows x 3
    columns)"


    * you didn't offer a C solution.


    No, I wanted to see if a shell solution was 'good enough'.



    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Thu Mar 26 00:14:00 2026
    On 3/26/2026 12:02 AM, Richard Harnden wrote:
    On 26/03/2026 03:21, DFS wrote:
    On 3/25/2026 8:32 AM, Richard Harnden wrote:

    In shell ...

    ----
    #!/bin/ksh

    sort words_unsorted.txt >words.srt
    uniq words.srt >words.unq

    cat <<-X
    ˙˙˙˙ There are $(wc -l words.srt |\
    ˙˙˙˙˙˙˙˙ cut -d\˙ -f1) words in words_unsorted.txt
    ˙˙˙˙ and $(wc -l words.unq | cut -d\˙ -f1) are unique.

    ˙˙˙˙ Duplicates are: $(diff words.srt words.unq |\
    ˙˙˙˙˙˙˙˙ grep "<" | cut -d\˙ -f2 | sed "s/^\(.\)/\u\1/g" |\
    ˙˙˙˙˙˙˙˙ sort | tr '\n' ' ')

    ˙˙˙˙ Counts:
    ˙˙˙˙ $(
    ˙˙˙˙˙˙˙˙ for X in {a..z}
    ˙˙˙˙˙˙˙˙ do
    ˙˙˙˙˙˙˙˙˙˙˙˙ echo -n "$X " ; grep -c ^$X words.unq
    ˙˙˙˙˙˙˙˙ done
    ˙˙˙˙ )

    ˙˙˙˙ Samples ...
    X

    for X in {a..z}
    do
    ˙˙˙˙ grep ^$X words.unq | shuf -n 100 | sed "s/^\(.\)/\u\1/g"
    done >words.tmp

    head -1000 words.tmp >words.0
    head -2000 words.tmp | tail -1000 >words.1
    tail -600 words.tmp > words.2

    paste words.0 words.1 words.2 | nl -w4 -s". "

    rm words.srt words.unq words.0 words.1 words.2

    return 0


    29 lines.˙ Speed is good.˙ Very nice.˙ Probably took you no more than
    an hour to write.

    How do I run it?˙ I tried this in Windows Subsystem for Linux:

    $ sudo ksh harnden.sh
    : not found2]:
    uniq: words.srt: No such file or directory
    : not found5]:
    : not found6]:
    wc: words.srt: No such file or directory

    Make sure that words_unsorted.txt is in the same directory,
    that harnden.sh is executable,
    then: ./harnden.sh

    No need for sudo.


    $ ksh ./harnden.sh: not foundh[2]:
    : cannot create [Permission denied]
    : cannot create [Permission denied]
    : not foundh[5]:
    wc: words.srt: No such file or directory
    : not foundh[6]:


    words_unsorted.txt is in the directory

    After that runs:
    words.srt is created (and the words are sorted)
    words.unq is empty




    Unfortunately...

    * the output doesn't meet the requirements: the words are unique and
    ˙˙ sorted, but they're not each randomly chosen by a RNG().

    shuf(1) will call rand(3)


    * you hard-coded your output logic: 3 groups of words in 3 columns.

    True, but it satisfies your "2600 unique random words (1000 rows x 3 columns)"


    * you didn't offer a C solution.


    No, I wanted to see if a shell solution was 'good enough'.




    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Thu Mar 26 01:02:49 2026
    On 3/26/2026 12:14 AM, DFS wrote:
    On 3/26/2026 12:02 AM, Richard Harnden wrote:
    On 26/03/2026 03:21, DFS wrote:
    On 3/25/2026 8:32 AM, Richard Harnden wrote:

    In shell ...

    ----
    #!/bin/ksh

    sort words_unsorted.txt >words.srt
    uniq words.srt >words.unq

    cat <<-X
    ˙˙˙˙ There are $(wc -l words.srt |\
    ˙˙˙˙˙˙˙˙ cut -d\˙ -f1) words in words_unsorted.txt
    ˙˙˙˙ and $(wc -l words.unq | cut -d\˙ -f1) are unique.

    ˙˙˙˙ Duplicates are: $(diff words.srt words.unq |\
    ˙˙˙˙˙˙˙˙ grep "<" | cut -d\˙ -f2 | sed "s/^\(.\)/\u\1/g" |\
    ˙˙˙˙˙˙˙˙ sort | tr '\n' ' ')

    ˙˙˙˙ Counts:
    ˙˙˙˙ $(
    ˙˙˙˙˙˙˙˙ for X in {a..z}
    ˙˙˙˙˙˙˙˙ do
    ˙˙˙˙˙˙˙˙˙˙˙˙ echo -n "$X " ; grep -c ^$X words.unq
    ˙˙˙˙˙˙˙˙ done
    ˙˙˙˙ )

    ˙˙˙˙ Samples ...
    X

    for X in {a..z}
    do
    ˙˙˙˙ grep ^$X words.unq | shuf -n 100 | sed "s/^\(.\)/\u\1/g"
    done >words.tmp

    head -1000 words.tmp >words.0
    head -2000 words.tmp | tail -1000 >words.1
    tail -600 words.tmp > words.2

    paste words.0 words.1 words.2 | nl -w4 -s". "

    rm words.srt words.unq words.0 words.1 words.2

    return 0


    29 lines.˙ Speed is good.˙ Very nice.˙ Probably took you no more than
    an hour to write.

    How do I run it?˙ I tried this in Windows Subsystem for Linux:

    $ sudo ksh harnden.sh
    : not found2]:
    uniq: words.srt: No such file or directory
    : not found5]:
    : not found6]:
    wc: words.srt: No such file or directory

    Make sure that words_unsorted.txt is in the same directory,
    that harnden.sh is executable,
    then: ./harnden.sh

    No need for sudo.


    $ ksh ./harnden.sh: not foundh[2]:
    : cannot create [Permission denied]
    : cannot create [Permission denied]
    : not foundh[5]:
    wc: words.srt: No such file or directory
    : not foundh[6]:


    words_unsorted.txt is in the directory

    After that runs:
    ˙words.srt is created (and the words are sorted)
    ˙words.unq is empty


    I converted your script using dos2unix and it ran fine.

    Very fast, too: 0.288s

    updated the rm command to also remove words.tmp

    updated the paste command for spacing.

    paste words.0 words.1 words.2 |
    nl -w4 -s". " |
    awk '{printf "%d. %-20s %-20s %-20s\n", $1, $2, $3, $4}'


    And got a nice output.

    Next I'll figure out how to pass in a file name and row column values
    from the command line, and print a variable sized grid of rows by columns.





    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Tristan Wibberley@3:633/10 to All on Thu Mar 26 12:08:35 2026
    On 26/03/2026 03:38, DFS wrote:
    On 3/24/2026 8:29 PM, Tristan Wibberley wrote:
    On 24/03/2026 17:17, DFS wrote:
    On 3/24/2026 7:03 AM, Tristan Wibberley wrote:
    On 23/03/2026 04:06, DFS wrote:
    On 3/22/2026 8:05 PM, Tristan Wibberley wrote:
    On 22/03/2026 23:14, DFS wrote:
    On 3/22/2026 7:02 PM, Tristan Wibberley wrote:
    On 22/03/2026 14:38, DFS wrote:
    ---------------------
    Objective
    ---------------------
    deliver a C (and optional 2nd language) program that - from a >>>>>>>>> large
    list
    of unsorted words possibly containing duplicates - extracts 26 >>>>>>>>> sets of
    100 random and unique words that each begin with a letter of the >>>>>>>>> English
    alphabet.

    What random distribution, uniform?
    Said distribution over the unique words or said distribution
    over the
    original list?

    pseudorandom?


    I don't care about the uniformity of the distribution, as long as >>>>>>> the
    output is unique words, and you generate and use 2600+ random values >>>>>>> from a RNG.


    I think you're unaware that I can predictably generate a sequence of >>>>>> identical values when the distribution is free and your
    specification is
    satisfied by selecting with a distribution that prefers just one
    indicatory value for a choice of word to the exclusion of all others. >>>>>

    Yeah, I don't really know what any of that means.˙ But it sounds like >>>>> your 3rd attempt to sidestep the generation and use of 2600+ random
    values.

    I think you could show some interesting techniques, but you have to
    adhere to the requirements of the challenge.

    It's because of my deeper understanding of the meaning (or barely
    meaningfulness) of the word "random" and my awareness of how
    critical it
    is to many applications of randomness.

    That is:

    ˙˙ - If it's a game I could go ahead with a PRNG and satisfy you
    easily -
    but it's not interesting to me these days, at this point I think it's a >>>> game,

    A game... now you're onto me.



    ˙˙ - If it's a secure application of choice that leaks /no/ information >>>> about the input list beyond the fact of the achievement of the
    lower-bound, respectively, on the number of words having each
    initial, I
    can tighten your specification in some of the ways I've queried,

    I sense your "tightening" will result in an unreadable spec, but it
    would be fun to try.˙ So let's have it.

    I don't think it would be unreadable. It might require some thoughtto
    synthesise a program that satisfies it.

    But you've told me its just a game (I suppose "toy", rather than
    gambling game). So the interesting bit could be just like:

    "produce the output so that, within each of the groupings based on the
    initial letter, the words are superficially shuffled around even when
    they're not shuffled around in the input."


    So a sorted input list is loaded, then the groupings by letter are
    shuffled "superficially".

    What constitutes a superficial shuffle?

    I don't know, it's you that said the random distribution wasn't important.
    .

    --
    Tristan Wibberley

    The message body is Copyright (C) 2026 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except that you may,
    of course, cite it academically giving credit to me, distribute it
    verbatim as part of a usenet system or its archives, and use it to
    promote my greatness and general superiority without misrepresentation
    of my opinions other than my opinion of my greatness and general
    superiority which you _may_ misrepresent. You definitely MAY NOT train
    any production AI system with it but you may train experimental AI that
    will only be used for evaluation of the AI methods it implements.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Thu Mar 26 08:34:50 2026
    On 3/26/2026 8:08 AM, Tristan Wibberley wrote:

    What constitutes a superficial shuffle?

    I don't know, it's you that said the random distribution wasn't important.
    .

    More nutty vibes.

    You're excused to consult with your doctor or lawyer or both.





    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Thu Mar 26 09:41:58 2026
    On 3/25/2026 7:54 AM, Bart wrote:
    On 24/03/2026 17:06, DFS wrote:
    On 3/23/2026 6:26 PM, Bart wrote:

    It looks very clunky but seems to do the job, and not too slowly
    either (see below).

    Years ago I was shocked how fast C chewed thru text data (and it's
    even faster dealing with numbers).

    Actually, I'm still shocked.˙ I wrote an anagram program in C that
    used prime factors to do searches, and it found 5 anagrams from a list
    of 370K words in 0.0055s (5.5/1000ths of a second).


    You're attributing too much to C. Or maybe comparing it too much to
    Python which is very slow.

    And VB/A.


    There are other factors: hardware today is incredibly fast (like 4 magnitudes or more faster than the 8-bit machines I started off on).

    And a lot of it is due to the optimising compilers now available.

    My own systems language is aso quite low-level. And can be just as fast
    if someone were to write an optimising compiler for it too!

    (As it is, it's not far off. Its self-hosted compiler can build over 20
    new generations of itself per second, on a machine slower than yours.)



    And it would be even faster with the use of a hash table.˙ Incredible.

    And that's on my low-end AMD Ryzen 5600G (16GB DDR4-3200 RAM)

    If that's low-end, what would be high-end? I mean in desktop computer
    terms not some supercomputer.


    Highest end PCs
    --------------------------------------------
    These CPUs
    https://www.cpubenchmark.net/single-thread/ https://www.cpubenchmark.net/multithread/

    paired with

    these DDR5 RAMs
    https://www.memorybenchmark.net/

    paired with

    these 5th gen PCIe NVMe drives
    https://www.harddrivebenchmark.net/drives/

    paired with

    these high-end, unbelievably expensive video cards https://www.videocardbenchmark.net/high_end_gpus.html


    You should be set forever.

    I haven't priced out a full system in a long time, and RAM prices have
    surged the last 6 months, but you can probably still get a smokin' fast
    tower computer with a low-end-but-plenty-fast-enough video card for
    $2500 to $3000.

    Research, order and build it yourself to save $500+.



    No extra copy of the list is necessary to find duplicates (but for
    one- > pass efficiency, sorting the list is required).

    Look at the first letter of each duplicate.

    "congratulations on the wherewithal youngun"
    cotwy

    Sort the file and the dupes are already sorted.˙ That was intentional.

    If that explanation lets you drop some lines, good.

    I'm now down to 150 sloc for the C version, and 125 sloc for the M version.

    I'm at 146 (but if the dupes were unsorted would need a few more)


    My method of finding the 26 sets was to:

    1) count words by letter as the file is read in
    lettercnt[wordsin[i][0]-'a']++;

    (I saw something similar in your scripting, but couldn't spot it in
    your .c)

    It's probably this line:

    ˙˙˙˙˙˙˙˙˙ ++nbig[(unsigned char)buffer[0]]

    The cast is because 'char' is signed and could be negative.

    Note that my arrays can have arbitrary lower bounds (this is a rare
    feature among HLLS),

    Sounds dangerous.



    and here start from 'a'.


    2) sort the data just read in
    qsort(wordsin, wordcnt, sizeof(char*), comparechar);


    3) using the lettercnt[] array from step 1, determine the start-end
    positions of each set of words beginning with a..z

    Letter˙˙ Start˙˙ End
    a˙˙˙˙˙˙˙˙˙˙ 0˙˙ 20484
    b˙˙˙˙˙˙ 20485˙˙ 34475
    c˙˙˙˙˙˙ 34476˙˙ 60069
    d˙˙˙˙˙˙ 60070˙˙ 75050
    e˙˙˙˙˙˙ 75051˙˙ 86572
    f˙˙˙˙˙˙ 86573˙˙ 95977
    g˙˙˙˙˙˙ 95978˙ 104985
    h˙˙˙˙˙ 104986˙ 116490
    i˙˙˙˙˙ 116491˙ 127653
    j˙˙˙˙˙ 127654˙ 129796
    k˙˙˙˙˙ 129797˙ 132749
    l˙˙˙˙˙ 132750˙ 140949
    m˙˙˙˙˙ 140950˙ 157658
    n˙˙˙˙˙ 157659˙ 166088
    o˙˙˙˙˙ 166089˙ 175859
    p˙˙˙˙˙ 175860˙ 205604
    q˙˙˙˙˙ 205605˙ 207078
    r˙˙˙˙˙ 207079˙ 221162
    s˙˙˙˙˙ 221163˙ 253678
    t˙˙˙˙˙ 253679˙ 269769
    u˙˙˙˙˙ 269770˙ 287936
    v˙˙˙˙˙ 287937˙ 292502
    w˙˙˙˙˙ 292503˙ 297884
    x˙˙˙˙˙ 297885˙ 298330
    y˙˙˙˙˙ 298331˙ 299249
    z˙˙˙˙˙ 299250˙ 300397


    4) generate 100+ random numbers between start and end of each letter
    int r = (rand() % (end - start + 1)) + start;

    This 'calculation of start and end' for each letter is what I thought
    to be a novel approach.

    I'm curious how others will approach it (if anyone else tries).

    I don't understand what's going on there.

    sorted array
    --------------------------------------
    Position in Array
    Letter WordCnt Start End
    --------------------------------------
    a 20485 0 20484
    b 13991 20485 34475
    c 25594 34476 60069
    d 14981 60070 75050
    e 11522 75051 86572
    f 9405 86573 95977
    g 9008 95978 104985
    h 11505 104986 116490
    i 11163 116491 127653
    j 2143 127654 129796
    k 2953 129797 132749
    l 8200 132750 140949
    m 16709 140950 157658
    n 8430 157659 166088
    o 9771 166089 175859
    p 29745 175860 205604
    q 1474 205605 207078
    r 14084 207079 221162
    s 32516 221163 253678
    t 16091 253679 269769
    u 18167 269770 287936
    v 4566 287937 292502
    w 5382 292503 297884
    x 446 297885 298330
    y 919 298331 299249
    z 1148 299250 300397
    --------------------------------------

    So the start-end values become the range of randoms generated for that
    letter.
    int r = (rand() % (end - start + 1)) + start;


    If there are N words in total
    that start with 'c', say, then I just generate a random number from 0 to
    N-1 (C), or 1 to N (M).

    How do you address a word at position 99999 in the sorted list by using
    0 or 1?



    Altogether my program makes:
    3 passes thru the 300398 words in:
    ˙˙ * 1 to count total words and words by letter
    ˙˙ * 1 to load the words into an array
    ˙˙ * 1 to find duplicates

    2 passes thru the 2600 words out:
    ˙˙ * 1 to verify the 100 words per letter
    ˙˙ * 1 to print all 2600 words

    5 total passes?˙ Not sure that's Ivy League.˙ But everything runs in
    1/10th of a second so I can't complain.

    In 0.25 seconds on my machine! This is why it can better to not use the fastest machine around, then you can spot inefficiencies more easily.


    I just added internal timing code to the C program:

    1) loaded 300398 words in 0.028 seconds
    2) created 26 sets of 100 unique words in 0.067 seconds
    3) printed counts of words by letter in 0.000 seconds
    4) identified and printed duplicate words in 0.003 seconds
    5) printed 2600 words in 0.002 seconds
    6) total run time is 0.101 seconds

    Total run time isn't a sum of those operations times. It's from a
    master timer that starts at the very beginning and ends after 1-5 are completed. So it *should* match, and it probably does if the values are carried out to 4 or 5 decimals.




    That was Windows; on WSL it was a little slower: 0.4 seconds 'real' time.

    If you have a short challenge of medium difficulty, post it so we can
    learn and improve skillz.

    I tried the same program on an unsorted list 10 times the size. That is, just duplicating everything to get a 3,003,980-line file.

    Generally programs still worked, but took longer, and the list of
    duplicates was a bit bigger!

    The Python version took 4.2s or 5s on PyPy. My Q version got much slower
    at 14s (maybe the interpreted sort is the reason).

    I added more dupes and 10-timesd the file to 3004320 words and got 2.5s
    with C, and 2.1s with Python (it's extremely rare that Python runs
    faster than C)

    With no print to screen it was 0.65 C, and 1.9s Python



    Your C version was 4.5s. Mine are 3.x but they cap the duplicates at
    100 so they can't be compared.




    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Scott Lurndal@3:633/10 to All on Thu Mar 26 15:32:08 2026
    DFS <nospam@dfs.com> writes:
    On 3/25/2026 8:32 AM, Richard Harnden wrote:

    In shell ...

    ----
    #!/bin/ksh

    sort words_unsorted.txt >words.srt
    uniq words.srt >words.unq

    cat <<-X
    ˙˙˙ There are $(wc -l words.srt |\
    ˙˙˙˙˙˙˙ cut -d\˙ -f1) words in words_unsorted.txt
    ˙˙˙ and $(wc -l words.unq | cut -d\˙ -f1) are unique.

    ˙˙˙ Duplicates are: $(diff words.srt words.unq |\
    ˙˙˙˙˙˙˙ grep "<" | cut -d\˙ -f2 | sed "s/^\(.\)/\u\1/g" |\
    ˙˙˙˙˙˙˙ sort | tr '\n' ' ')

    ˙˙˙ Counts:
    ˙˙˙ $(
    ˙˙˙˙˙˙˙ for X in {a..z}
    ˙˙˙˙˙˙˙ do
    ˙˙˙˙˙˙˙˙˙˙˙ echo -n "$X " ; grep -c ^$X words.unq
    ˙˙˙˙˙˙˙ done
    ˙˙˙ )

    ˙˙˙ Samples ...
    X

    for X in {a..z}
    do
    ˙˙˙ grep ^$X words.unq | shuf -n 100 | sed "s/^\(.\)/\u\1/g"
    done >words.tmp

    head -1000 words.tmp >words.0
    head -2000 words.tmp | tail -1000 >words.1
    tail -600 words.tmp > words.2

    paste words.0 words.1 words.2 | nl -w4 -s". "

    rm words.srt words.unq words.0 words.1 words.2

    return 0


    29 lines. Speed is good. Very nice. Probably took you no more than an >hour to write.

    How do I run it? I tried this in Windows Subsystem for Linux:

    $ sudo ksh harnden.sh
    : not found2]:
    uniq: words.srt: No such file or directory
    : not found5]:
    : not found6]:
    wc: words.srt: No such file or directory



    Unfortunately...

    * the output doesn't meet the requirements: the words are unique and
    sorted, but they're not each randomly chosen by a RNG().

    So pipe the wordlist through 'shuf' each time you select.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Thu Mar 26 11:35:28 2026
    On 3/25/2026 8:32 AM, Richard Harnden wrote:
    ----
    #!/bin/ksh

    sort words_unsorted.txt >words.srt
    uniq words.srt >words.unq

    cat <<-X
    ˙˙˙ There are $(wc -l words.srt |\
    ˙˙˙˙˙˙˙ cut -d\˙ -f1) words in words_unsorted.txt
    ˙˙˙ and $(wc -l words.unq | cut -d\˙ -f1) are unique.

    ˙˙˙ Duplicates are: $(diff words.srt words.unq |\
    ˙˙˙˙˙˙˙ grep "<" | cut -d\˙ -f2 | sed "s/^\(.\)/\u\1/g" |\
    ˙˙˙˙˙˙˙ sort | tr '\n' ' ')

    ˙˙˙ Counts:
    ˙˙˙ $(
    ˙˙˙˙˙˙˙ for X in {a..z}
    ˙˙˙˙˙˙˙ do
    ˙˙˙˙˙˙˙˙˙˙˙ echo -n "$X " ; grep -c ^$X words.unq
    ˙˙˙˙˙˙˙ done
    ˙˙˙ )

    ˙˙˙ Samples ...
    X

    for X in {a..z}
    do
    ˙˙˙ grep ^$X words.unq | shuf -n 100 | sed "s/^\(.\)/\u\1/g"
    done >words.tmp

    head -1000 words.tmp >words.0
    head -2000 words.tmp | tail -1000 >words.1
    tail -600 words.tmp > words.2

    paste words.0 words.1 words.2 | nl -w4 -s". "

    rm words.srt words.unq words.0 words.1 words.2

    return 0
    ----


    As a learning exercise I rewrote yours as much as I could. It now takes
    row and column inputs, and prints column by row. And I changed the
    words output from proper case to lower case.

    It's consistently a tad slower in bash
    bash: 0.31s
    ksh : 0.29s

    example usage: $ time bash ./dfs.sh words_unsorted.txt 700 4 ===================================================================== #!/bin/bash
    unsorted="$1"; rows="$2"; cols="$3"
    sort "$unsorted" | tee words.sort | uniq > words.uniq
    w=$(wc -l < "words.sort"); echo "$w words in $unsorted"
    w=$(wc -l < "words.uniq"); echo -e "$w unique words\n"
    printf "Duplicates: "
    comm -23 words.sort words.uniq | sed 's/\b\(.\)/\u\1/g' | tr '\n' ' '
    echo -e "\n\nWord Counts\n By Letter\n-----------"
    for x in {a..z}
    do
    echo -n "$x " ; grep -c ^$x words.uniq
    done
    echo -e "-----------\n"
    for x in {a..z}
    do
    grep ^$x words.uniq | shuf -n 100 | sed "s/^\(.\)/\L&\1/g"
    done > words.temp
    sort words.temp > words.sort
    words=($(cat "words.sort"))
    if ((1==1)); then
    for ((r = 1; r <= $rows; r++)); do
    if ((r <= 2600)); then
    nbr=$((r))
    printf "%3d. %-25s" "$nbr" "${words[nbr-1]}"
    for ((i = 0; i < $cols-1; i++)); do
    nbr=$(($rows + nbr))
    if ((nbr <= 2600)); then
    printf "%-25s" "${words[nbr-1]}"
    fi
    done
    printf "\n"
    fi
    done
    fi
    rm words.sort words.temp words.uniq
    printf "\n" =====================================================================

    I also have the commented version with command line validations if you
    care. (2 numeric inputs for rows and columns, and rows * cols >= 2600)

    I did something wrong - the words output has 2 first letters. Can you
    spot where I messed up?


    Thanks


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Tristan Wibberley@3:633/10 to All on Thu Mar 26 15:37:11 2026
    On 25/03/2026 11:54, Bart wrote:
    it can better to not use the fastest machine around, then you can spot inefficiencies more easily.

    The computer hardware market looks to me to be about to explode into so
    many variations that efficiency measurements will be transferrable only
    across a very small family of machines and low-level languages like C
    will soon bear little resemblance to hardware.

    The IBM, Knuth, and von Neumann ages are over.


    --
    Tristan Wibberley

    The message body is Copyright (C) 2026 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except that you may,
    of course, cite it academically giving credit to me, distribute it
    verbatim as part of a usenet system or its archives, and use it to
    promote my greatness and general superiority without misrepresentation
    of my opinions other than my opinion of my greatness and general
    superiority which you _may_ misrepresent. You definitely MAY NOT train
    any production AI system with it but you may train experimental AI that
    will only be used for evaluation of the AI methods it implements.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bart@3:633/10 to All on Thu Mar 26 16:37:03 2026
    On 26/03/2026 13:41, DFS wrote:
    On 3/25/2026 7:54 AM, Bart wrote:

    You should be set forever.

    I haven't priced out a full system in a long time, and RAM prices have surged the last 6 months, but you can probably still get a smokin' fast tower computer with a low-end-but-plenty-fast-enough video card for
    $2500 to $3000.

    Research, order and build it yourself to save $500+.

    Actually, for the stuff I do, I don't need anything that fast.




    No extra copy of the list is necessary to find duplicates (but for
    one- > pass efficiency, sorting the list is required).

    Look at the first letter of each duplicate.

    "congratulations on the wherewithal youngun"
    cotwy

    Sort the file and the dupes are already sorted.˙ That was intentional.

    If that explanation lets you drop some lines, good.

    I'm now down to 150 sloc for the C version, and 125 sloc for the M
    version.

    I'm at 146 (but if the dupes were unsorted would need a few more)

    Slightly puzzled as the comment in your first C version said it was 125
    lines. But if I put that into github, it says it's 164 sloc.

    Note that my arrays can have arbitrary lower bounds (this is a rare
    feature among HLLS),

    Sounds dangerous.

    It can be safer than C. If my 'char' was signed like C's can be, and I
    wanted to create a histogram from some string, then I can just do this:

    [-128..127]int hist # [char.bounds] would be used in practice
    ++hist[c]

    In C, you'd get a bounds error if 'c' was not within the ASCII range.
    You'd have to either apply offsets or cast to unsigned, but it is
    something you have to remember to do.

    More typically, if you're porting code from some 1-based algorithm, you
    can have off-by-one errors (and UB), if you don't fully account for it.
    It's great to have that choice in the language. (In my projects, about
    1/3 of arrays are 0-based; most are 1-based.)


    I don't understand what's going on there.

    sorted array
    --------------------------------------
    ˙˙˙˙˙˙˙˙˙˙˙ Position in Array
    Letter˙ WordCnt˙˙˙˙˙˙ Start˙˙ End
    --------------------------------------
    a˙˙˙˙˙˙ 20485˙˙˙˙˙˙˙˙˙˙˙˙ 0˙˙˙ 20484
    b˙˙˙˙˙˙ 13991˙˙˙˙˙˙˙˙ 20485˙˙˙ 34475
    c˙˙˙˙˙˙ 25594˙˙˙˙˙˙˙˙ 34476˙˙˙ 60069
    d˙˙˙˙˙˙ 14981˙˙˙˙˙˙˙˙ 60070˙˙˙ 75050
    e˙˙˙˙˙˙ 11522˙˙˙˙˙˙˙˙ 75051˙˙˙ 86572
    f˙˙˙˙˙˙˙ 9405˙˙˙˙˙˙˙˙ 86573˙˙˙ 95977
    g˙˙˙˙˙˙˙ 9008˙˙˙˙˙˙˙˙ 95978˙˙ 104985
    h˙˙˙˙˙˙ 11505˙˙˙˙˙˙˙ 104986˙˙ 116490
    i˙˙˙˙˙˙ 11163˙˙˙˙˙˙˙ 116491˙˙ 127653
    j˙˙˙˙˙˙˙ 2143˙˙˙˙˙˙˙ 127654˙˙ 129796
    k˙˙˙˙˙˙˙ 2953˙˙˙˙˙˙˙ 129797˙˙ 132749
    l˙˙˙˙˙˙˙ 8200˙˙˙˙˙˙˙ 132750˙˙ 140949
    m˙˙˙˙˙˙ 16709˙˙˙˙˙˙˙ 140950˙˙ 157658
    n˙˙˙˙˙˙˙ 8430˙˙˙˙˙˙˙ 157659˙˙ 166088
    o˙˙˙˙˙˙˙ 9771˙˙˙˙˙˙˙ 166089˙˙ 175859
    p˙˙˙˙˙˙ 29745˙˙˙˙˙˙˙ 175860˙˙ 205604
    q˙˙˙˙˙˙˙ 1474˙˙˙˙˙˙˙ 205605˙˙ 207078
    r˙˙˙˙˙˙ 14084˙˙˙˙˙˙˙ 207079˙˙ 221162
    s˙˙˙˙˙˙ 32516˙˙˙˙˙˙˙ 221163˙˙ 253678
    t˙˙˙˙˙˙ 16091˙˙˙˙˙˙˙ 253679˙˙ 269769
    u˙˙˙˙˙˙ 18167˙˙˙˙˙˙˙ 269770˙˙ 287936
    v˙˙˙˙˙˙˙ 4566˙˙˙˙˙˙˙ 287937˙˙ 292502
    w˙˙˙˙˙˙˙ 5382˙˙˙˙˙˙˙ 292503˙˙ 297884
    x˙˙˙˙˙˙˙˙ 446˙˙˙˙˙˙˙ 297885˙˙ 298330
    y˙˙˙˙˙˙˙˙ 919˙˙˙˙˙˙˙ 298331˙˙ 299249
    z˙˙˙˙˙˙˙ 1148˙˙˙˙˙˙˙ 299250˙˙ 300397
    --------------------------------------

    So the start-end values become the range of randoms generated for that letter.
    int r = (rand() % (end - start + 1)) + start;

    I see, so you don't just have a char*[] array for each letter, you store start/end indices in the master set of words, which also gives you the word-count.



    If there are N words in total that start with 'c', say, then I just
    generate a random number from 0 to N-1 (C), or 1 to N (M).

    How do you address a word at position 99999 in the sorted list by using
    0 or 1?

    There are 26 lists each containing all the words starting with that
    letter. Each is of a different size (eg. 20485 for 'a').

    The word lists are 0-based in C, and 1-Based in N, so that I need a
    random number from 0 to 20484 in C, and 1..20485 in M, to select one of
    those 20485 words.




    Altogether my program makes:
    3 passes thru the 300398 words in:
    ˙˙ * 1 to count total words and words by letter
    ˙˙ * 1 to load the words into an array
    ˙˙ * 1 to find duplicates

    2 passes thru the 2600 words out:
    ˙˙ * 1 to verify the 100 words per letter
    ˙˙ * 1 to print all 2600 words

    5 total passes?˙ Not sure that's Ivy League.˙ But everything runs in
    1/10th of a second so I can't complain.

    In 0.25 seconds on my machine! This is why it can better to not use
    the fastest machine around, then you can spot inefficiencies more easily.


    I just added internal timing code to the C program:

    1) loaded 300398 words in˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ 0.028 seconds
    2) created 26 sets of 100 unique words in˙˙˙ 0.067 seconds
    3) printed counts of words by letter in˙˙˙˙˙ 0.000 seconds
    4) identified and printed duplicate words in 0.003 seconds
    5) printed 2600 words in˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ 0.002 seconds
    6) total run time is˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ 0.101 seconds

    So most of your runtime is in getting those 2600 words? That's very
    suprising. In mine that takes the least time:

    * Read words, copy into 26 sets 123 ms
    * Get 26 random sets of 100 2 ms
    * Challenge 1 (display sizes) 0 ms
    * Challenge 2 (duplicates) 147 ms
    * Challenge 3 (display the 2600) 14 ms

    Challenge 2 involves sorting the 300K words.

    Note that my method involves reading the file twice, using fgets. In a
    real program I would read the entire file into memory, and do any
    subsequent processing there.


    Your C version was 4.5s. Mine are 3.x but they cap the duplicates at
    100 so they can't be compared.

    (I removed that cap, after realising dupls were already sorted, and
    runtime was then about the same as your C version.)

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Richard Harnden@3:633/10 to All on Thu Mar 26 16:40:58 2026
    On 26/03/2026 15:35, DFS wrote:
    I did something wrong - the words output has 2 first letters.˙ Can you
    spot where I messed up?

    for x in {a..z}
    do
    grep ^$x words.uniq | shuf -n 100 | sed "s/^\(.\)/\L&\1/g"
    done > words.temp


    You don't need the '| sed "s/^\(.\)/\L&\1/g"' bit.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Tim Rentsch@3:633/10 to All on Thu Mar 26 19:59:57 2026
    Michael S <already5chosen@yahoo.com> writes:

    On Sun, 22 Mar 2026 23:21:43 +0000
    Tristan Wibberley <tristan.wibberley+netnews2@alumni.manchester.ac.uk>
    wrote:

    On 22/03/2026 14:38, DFS wrote:

    You must call a RNG 2600+ times to build the list

    ie you can't use the
    random ordering of the input file to your advantage).

    The two are not the same, that is, the use of "ie" is wrong.

    Which do you really require, or do you really require I satisfy the
    conjunction of the two?

    Do you try to hint that challenges with seemingly arbitrary rules and seemingly arbitrary purposes are not very worthy?
    If yes, then you could as well say it directly.

    I read Tristan's comment as asking for a clarification rather than
    offering any judgments.

    Personally, I think the proposed challenge has some interesting
    parts. Unfortunately, other parts are dumb or pointless or
    needlessly tedious, which is disappointing.

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Fri Mar 27 12:16:32 2026
    On 3/26/2026 10:59 PM, Tim Rentsch wrote:


    Personally, I think the proposed challenge has some interesting
    parts. Unfortunately, other parts are dumb or pointless or
    needlessly tedious, which is disappointing.


    Give us a small Rentsch challenge that won't explode our brains.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Richard Harnden@3:633/10 to All on Fri Mar 27 17:24:15 2026
    On 25/03/2026 12:32, Richard Harnden wrote:
    On 22/03/2026 14:38, DFS wrote:
    ---------------------
    Objective
    ---------------------
    deliver a C (and optional 2nd language) program that - from a large
    list of unsorted words possibly containing duplicates - extracts 26
    sets of 100 random and unique words that each begin with a letter of
    the English alphabet.

    Here's my C attempt.

    146 lines, but I like my vertical whitespace.

    -----
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <time.h>

    #define COLS 3
    #define MAX 32

    static int qsort_strcmp(const void *p1, const void *p2)
    {
    return strcmp(*(const char **) p1, *(const char **) p2);
    }

    int main(void)
    {
    FILE *in = fopen("./words_unsorted.txt", "r");
    char s[MAX];
    int n;
    int l = 0;
    int total = 0;
    int count[26] = {0};
    int pos[26] = {0};
    char **words[26] = {0};
    char *out[2600] = {0};
    int d = 0;
    int o = 1 + 2600 / COLS;

    while ( fgets(s, 32, in) != NULL )
    {
    if ( (n = strlen(s)) > l ) l = n;

    total++;
    count[s[0]-'a']++;
    };

    rewind(in);

    printf("Total words: %d\n\n", total);

    for (int i=0; i<26; i++)
    words[i] = malloc(sizeof *words * count[i]);

    while ( fgets(s, 32, in) != NULL )
    {
    int a;

    n = strlen(s) - 1;
    s[n] = '\0';

    s[0] &= 0xdf;

    a = s[0] - 'A';

    words[a][pos[a]] = malloc(n+1);;

    strcpy(words[a][pos[a]], s);

    pos[a]++;
    }

    fclose(in);

    for (int i=0; i<26; i++)
    qsort(words[i], count[i], sizeof *words, qsort_strcmp);

    for (int i=0; i<26; i++)
    {
    char *prior = NULL;

    for (int j=0; j<count[i]; j++)
    {
    if ( prior == NULL ){
    prior = words[i][j];
    continue;
    }

    if ( strcmp(prior, words[i][j]) == 0 )
    {
    free(words[i][j]);
    words[i][j] = NULL;

    out[d] = prior;
    d++;
    }

    prior = words[i][j];
    }
    }

    printf("There are %d duplicates:", d);
    for (int i=0; i<d; i++)
    printf(" %s", out[i]);
    printf("\n\n");

    srand(time(NULL));

    for (int i=0; i<26; i++)
    {
    printf("Selecting 100 words out of %d from set '%c'\n",
    count[i], i+'A');

    int j = 0;

    while ( j < 100 )
    {
    int r = (rand() / (double) RAND_MAX) * count[i];

    if ( words[i][r] == NULL ) continue;

    out[i*100 + j] = words[i][r];
    words[i][r] = NULL;

    j++;
    }
    }

    printf("\n");

    qsort(out, 2600, sizeof *out, qsort_strcmp);

    for (int i=0; i<o ; i++)
    {
    printf("%4d: ", i + 1);

    for (int j=0; j<COLS; j++)
    {
    if ( i +j*o < 2600)
    printf("%-*s", l, out[i + j*o]);
    }

    printf("\n");
    }

    for (int i=0; i<26; i++)
    {
    for (int j=0; j<count[i]; j++)
    free(words[i][j]);

    free(words[i]);
    }

    for (int i=0; i<2600; i++)
    free(out[i]);

    return 0;
    }
    -----

    Total words: 300398

    There are 5 duplicates: Congratulations On The Wherewithal Youngun

    Selecting 100 words out of 20485 from set 'A'
    Selecting 100 words out of 13991 from set 'B'
    [...]
    Selecting 100 words out of 919 from set 'Y'
    Selecting 100 words out of 1148 from set 'Z'

    1: Abastardize Intercurrence
    Reinstitution
    2: Abetter Interepidemic
    Reiterating
    [...]
    866: Intentiveness Rehonour Zythum

    867: Interassociation Reinforcing



    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Sat Mar 28 00:52:24 2026
    On 3/27/2026 1:24 PM, Richard Harnden wrote:
    On 25/03/2026 12:32, Richard Harnden wrote:
    On 22/03/2026 14:38, DFS wrote:
    ---------------------
    Objective
    ---------------------
    deliver a C (and optional 2nd language) program that - from a large
    list of unsorted words possibly containing duplicates - extracts 26
    sets of 100 random and unique words that each begin with a letter of
    the English alphabet.

    Here's my C attempt.

    146 lines, but I like my vertical whitespace.


    Thanks for the submission.

    It's 106 lines of code, so it's the shortest yet.

    The only part you didn't get quite right was:

    "print the 2600 words you identify in column x row order in a grid of
    size (200rows x 13cols or 300x9 or 400x7 or 500x6 or 600x4 etc) "


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Janis Papanagnou@3:633/10 to All on Sat Mar 28 06:59:14 2026
    Subject: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

    On 2026-03-28 05:52, DFS wrote:
    On 3/27/2026 1:24 PM, Richard Harnden wrote:
    On 25/03/2026 12:32, Richard Harnden wrote:
    On 22/03/2026 14:38, DFS wrote:
    ---------------------
    Objective
    ---------------------
    deliver a C (and optional 2nd language) program that - from a large
    list of unsorted words possibly containing duplicates - extracts 26
    sets of 100 random and unique words that each begin with a letter of
    the English alphabet.

    Here's my C attempt.

    146 lines, but I like my vertical whitespace.


    Thanks for the submission.

    It's 106 lines of code, so it's the shortest yet.

    The only part you didn't get quite right was:

    "print the 2600 words you identify in column x row order in a grid of
    ˙size (200rows x 13cols or 300x9 or 400x7 or 500x6 or 600x4 etc) "

    Ruminations on the recent "C" challenges...

    Some requirements appear to be quite arbitrary. But okay. When
    I read about the tasks to implement the first thought that came
    up was to use an appropriate language or tool-set, one that fits
    better for the task, tasks that at least I consider annoying to
    implement them in "C" because that language doesn't support it
    well, because of C's primitivity (its low-level'ness). But okay;
    we're in a C-group and the residents need feeding. - Why is it
    that I consider it annoying in "C"? - Because I'd have liked to
    implement such tasks based on existing _building blocks_; like
    associative arrays, sensible array data types, and what not.
    Instead of constructing and building a car with tools like an
    axe and a stone, wouldn't it be a more sensible to create useful
    tools in "C" to make such challenges concentrate more on the
    actual problem than on how to reinvent the simplest tasks again
    and again? - I'd certainly consider it worthwhile to challenge
    implementations of building blocks that alleviate C-programmers
    from all the boring error-prone and low-level tasks that are
    celebrated ad nauseam. - The question I'd ask myself if faced
    with (arbitrary or useful) requirements would be what elementary
    functions I'd need to construct the solution. Such identified
    and isolated features, i.e. their implementation, would have a
    persistent value for more than a single arbitrary "C" challenge.

    As said; just my upcoming ruminations about that. So, YMMV.
    (Disclaimer: angering folks with other mindsets not intended,
    but I'd also not be surprised if it's considered offensive.)

    Janis


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Sat Mar 28 09:05:28 2026
    Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

    On 3/28/2026 1:59 AM, Janis Papanagnou wrote:
    On 2026-03-28 05:52, DFS wrote:
    On 3/27/2026 1:24 PM, Richard Harnden wrote:
    On 25/03/2026 12:32, Richard Harnden wrote:
    On 22/03/2026 14:38, DFS wrote:
    ---------------------
    Objective
    ---------------------
    deliver a C (and optional 2nd language) program that - from a large >>>>> list of unsorted words possibly containing duplicates - extracts 26 >>>>> sets of 100 random and unique words that each begin with a letter
    of the English alphabet.

    Here's my C attempt.

    146 lines, but I like my vertical whitespace.


    Thanks for the submission.

    It's 106 lines of code, so it's the shortest yet.

    The only part you didn't get quite right was:

    "print the 2600 words you identify in column x row order in a grid of
    ˙˙size (200rows x 13cols or 300x9 or 400x7 or 500x6 or 600x4 etc) "

    Ruminations on the recent "C" challenges...

    Some requirements appear to be quite arbitrary.


    Every challenge/competition/sport is arbitrary: the rules, the
    constraints, the scoring, the technology allowed, the type of ball used,
    the size of the field of play, the surface, the measures of performance,
    the judging, the number of participants, etc.

    It's all made up on a whim.



    But okay. When
    I read about the tasks to implement the first thought that came
    up was to use an appropriate language or tool-set, one that fits
    better for the task, tasks that at least I consider annoying to
    implement them in "C" because that language doesn't support it
    well, because of C's primitivity (its low-level'ness). But okay;
    we're in a C-group and the residents need feeding. - Why is it
    that I consider it annoying in "C"? - Because I'd have liked to
    implement such tasks based on existing _building blocks_; like
    associative arrays, sensible array data types, and what not.
    Instead of constructing and building a car with tools like an
    axe and a stone, wouldn't it be a more sensible to create useful
    tools in "C" to make such challenges concentrate more on the
    actual problem than on how to reinvent the simplest tasks again
    and again? - I'd certainly consider it worthwhile to challenge implementations of building blocks that alleviate C-programmers
    from all the boring error-prone and low-level tasks that are
    celebrated ad nauseam. - The question I'd ask myself if faced
    with (arbitrary or useful) requirements would be what elementary
    functions I'd need to construct the solution. Such identified
    and isolated features, i.e. their implementation, would have a
    persistent value for more than a single arbitrary "C" challenge.


    I think one word would've sufficed where you used 235: python

    I just like to see how the good programmers of clc use C to approach
    different tasks. It's always educational to read clc.

    One "implementation of persistent value" you might've enjoyed if you participated is how to print data by column then row (it was in both of
    my recent challenges). There's a powerful Linux command (column) that
    does it, but it gives you no control over the number of rows, and no
    line numbering option.

    Combine the amt of data you have to print, the rows x columns you want
    to use, and a simple function to know how wide your terminal window is,
    and you have a lot of control over the presentation of data.

    int gettermcols() {
    struct winsize w = {0};
    ioctl(STDOUT_FILENO, TIOCGWINSZ, &w);
    return w.ws_col;
    }



    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Sat Mar 28 09:20:39 2026
    On 3/26/2026 10:59 PM, Tim Rentsch wrote:

    Personally, I think the proposed challenge has some interesting
    parts. Unfortunately, other parts are dumb or pointless or
    needlessly tedious, which is disappointing.

    Sounds like human life.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Janis Papanagnou@3:633/10 to All on Mon Mar 30 11:26:07 2026
    Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

    On 2026-03-28 14:05, DFS wrote:
    On 3/28/2026 1:59 AM, Janis Papanagnou wrote:
    [...]

    But okay. When
    I read about the tasks to implement the first thought that came
    up was to use an appropriate language or tool-set, one that fits
    better for the task, tasks that at least I consider annoying to
    implement them in "C" because that language doesn't support it
    well, because of C's primitivity (its low-level'ness). But okay;
    we're in a C-group and the residents need feeding. - Why is it
    that I consider it annoying in "C"? - Because I'd have liked to
    implement such tasks based on existing _building blocks_; like
    associative arrays, sensible array data types, and what not.
    Instead of constructing and building a car with tools like an
    axe and a stone, wouldn't it be a more sensible to create useful
    tools in "C" to make such challenges concentrate more on the
    actual problem than on how to reinvent the simplest tasks again
    and again? - I'd certainly consider it worthwhile to challenge
    implementations of building blocks that alleviate C-programmers
    from all the boring error-prone and low-level tasks that are
    celebrated ad nauseam. - The question I'd ask myself if faced
    with (arbitrary or useful) requirements would be what elementary
    functions I'd need to construct the solution. Such identified
    and isolated features, i.e. their implementation, would have a
    persistent value for more than a single arbitrary "C" challenge.


    I think one word would've sufficed where you used 235: python

    Sorry, I cannot associate that statement with anything I said. -
    What is that "235: python" referring to? - Mind to elaborate?

    Janis

    [...]


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bart@3:633/10 to All on Mon Mar 30 12:10:46 2026
    Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

    On 30/03/2026 10:26, Janis Papanagnou wrote:
    On 2026-03-28 14:05, DFS wrote:
    On 3/28/2026 1:59 AM, Janis Papanagnou wrote:
    [...]

    But okay. When
    I read about the tasks to implement the first thought that came
    up was to use an appropriate language or tool-set, one that fits
    better for the task, tasks that at least I consider annoying to
    implement them in "C" because that language doesn't support it
    well, because of C's primitivity (its low-level'ness). But okay;
    we're in a C-group and the residents need feeding. - Why is it
    that I consider it annoying in "C"? - Because I'd have liked to
    implement such tasks based on existing _building blocks_; like
    associative arrays, sensible array data types, and what not.
    Instead of constructing and building a car with tools like an
    axe and a stone, wouldn't it be a more sensible to create useful
    tools in "C" to make such challenges concentrate more on the
    actual problem than on how to reinvent the simplest tasks again
    and again? - I'd certainly consider it worthwhile to challenge
    implementations of building blocks that alleviate C-programmers
    from all the boring error-prone and low-level tasks that are
    celebrated ad nauseam. - The question I'd ask myself if faced
    with (arbitrary or useful) requirements would be what elementary
    functions I'd need to construct the solution. Such identified
    and isolated features, i.e. their implementation, would have a
    persistent value for more than a single arbitrary "C" challenge.


    I think one word would've sufficed where you used 235: python

    Sorry, I cannot associate that statement with anything I said. -
    What is that "235: python" referring to? - Mind to elaborate?

    The 235 refers to the number of words in your paragraph (I haven't checked).

    And 'python' is a summary of what they think you're trying to say. That language already has those ready-made building blocks.

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From James Kuyper@3:633/10 to All on Mon Mar 30 20:33:18 2026
    Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

    On 2026-03-30 05:26, Janis Papanagnou wrote:
    On 2026-03-28 14:05, DFS wrote:
    ...
    I think one word would've sufficed where you used 235: python

    Sorry, I cannot associate that statement with anything I said. -
    What is that "235: python" referring to? - Mind to elaborate?

    I cannot answer your question, but the way you worded it suggests to me
    that you may have parsed his comment incorrectly. It should be parsed as

    "I think one word would've sufficed where you used 235. That word is
    python."

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Janis Papanagnou@3:633/10 to All on Tue Mar 31 09:00:06 2026
    Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

    On 2026-03-31 02:33, James Kuyper wrote:
    On 2026-03-30 05:26, Janis Papanagnou wrote:
    On 2026-03-28 14:05, DFS wrote:
    ...
    I think one word would've sufficed where you used 235: python

    Sorry, I cannot associate that statement with anything I said. -
    What is that "235: python" referring to? - Mind to elaborate?

    I cannot answer your question, but the way you worded it suggests to me
    that you may have parsed his comment incorrectly.

    Indeed. - Thanks.

    It should be parsed as

    "I think one word would've sufficed where you used 235. That word is
    python."

    I see.

    Obviously any language with decent building blocks would qualify.
    It wouldn't have come to my mind that Python is the (only) answer
    to explain the characteristics in a generic way. It may serve as
    an example, but I cannot expect everyone knowing that language.

    But the the post was about this and other useful contests in "C".
    (Certainly not about "<language> is better than C".)

    Janis


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Tue Mar 31 12:11:22 2026
    Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

    On Mon, 30 Mar 2026 12:10:46 +0100
    Bart <bc@freeuk.com> wrote:

    On 30/03/2026 10:26, Janis Papanagnou wrote:
    On 2026-03-28 14:05, DFS wrote:
    On 3/28/2026 1:59 AM, Janis Papanagnou wrote:
    [...]

    But okay. When
    I read about the tasks to implement the first thought that came
    up was to use an appropriate language or tool-set, one that fits
    better for the task, tasks that at least I consider annoying to
    implement them in "C" because that language doesn't support it
    well, because of C's primitivity (its low-level'ness). But okay;
    we're in a C-group and the residents need feeding. - Why is it
    that I consider it annoying in "C"? - Because I'd have liked to
    implement such tasks based on existing _building blocks_; like
    associative arrays, sensible array data types, and what not.
    Instead of constructing and building a car with tools like an
    axe and a stone, wouldn't it be a more sensible to create useful
    tools in "C" to make such challenges concentrate more on the
    actual problem than on how to reinvent the simplest tasks again
    and again? - I'd certainly consider it worthwhile to challenge
    implementations of building blocks that alleviate C-programmers
    from all the boring error-prone and low-level tasks that are
    celebrated ad nauseam. - The question I'd ask myself if faced
    with (arbitrary or useful) requirements would be what elementary
    functions I'd need to construct the solution. Such identified
    and isolated features, i.e. their implementation, would have a
    persistent value for more than a single arbitrary "C" challenge.


    I think one word would've sufficed where you used 235: python

    Sorry, I cannot associate that statement with anything I said. -
    What is that "235: python" referring to? - Mind to elaborate?

    The 235 refers to the number of words in your paragraph (I haven't
    checked).


    I did. There are 223 words.
    So, now I have more interesting question - how did DFS come to number
    235? If by eye sight - it's impressively precise. If by use of word
    count utility - it's too imprecise.



    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bart@3:633/10 to All on Tue Mar 31 11:53:29 2026
    Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

    On 31/03/2026 10:11, Michael S wrote:
    On Mon, 30 Mar 2026 12:10:46 +0100
    Bart <bc@freeuk.com> wrote:

    On 30/03/2026 10:26, Janis Papanagnou wrote:
    On 2026-03-28 14:05, DFS wrote:
    On 3/28/2026 1:59 AM, Janis Papanagnou wrote:
    [...]

    But okay. When
    I read about the tasks to implement the first thought that came
    up was to use an appropriate language or tool-set, one that fits
    better for the task, tasks that at least I consider annoying to
    implement them in "C" because that language doesn't support it
    well, because of C's primitivity (its low-level'ness). But okay;
    we're in a C-group and the residents need feeding. - Why is it
    that I consider it annoying in "C"? - Because I'd have liked to
    implement such tasks based on existing _building blocks_; like
    associative arrays, sensible array data types, and what not.
    Instead of constructing and building a car with tools like an
    axe and a stone, wouldn't it be a more sensible to create useful
    tools in "C" to make such challenges concentrate more on the
    actual problem than on how to reinvent the simplest tasks again
    and again? - I'd certainly consider it worthwhile to challenge
    implementations of building blocks that alleviate C-programmers
    from all the boring error-prone and low-level tasks that are
    celebrated ad nauseam. - The question I'd ask myself if faced
    with (arbitrary or useful) requirements would be what elementary
    functions I'd need to construct the solution. Such identified
    and isolated features, i.e. their implementation, would have a
    persistent value for more than a single arbitrary "C" challenge.


    I think one word would've sufficed where you used 235: python

    Sorry, I cannot associate that statement with anything I said. -
    What is that "235: python" referring to? - Mind to elaborate?

    The 235 refers to the number of words in your paragraph (I haven't
    checked).


    I did. There are 223 words.
    So, now I have more interesting question - how did DFS come to number
    235? If by eye sight - it's impressively precise. If by use of word
    count utility - it's too imprecise.

    OK, now I have to count them! If I use 'wc' on the original paragraph
    that JP wrote, which starts like this:

    Some requirements appear to be quite arbitrary. But okay. ...

    Then it says 230 words. But there was also another line before that
    paragraph which was this:

    Ruminations on the recent "C" challenges...

    If that is included, then 'wc' reports 236 words. (It's also possible
    that DFS mistyped the value.)

    Presumably your count starts from 'But okay;'; then I get 223 words too.



    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Tue Mar 31 14:22:10 2026
    Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

    On Tue, 31 Mar 2026 11:53:29 +0100
    Bart <bc@freeuk.com> wrote:

    On 31/03/2026 10:11, Michael S wrote:
    On Mon, 30 Mar 2026 12:10:46 +0100
    Bart <bc@freeuk.com> wrote:

    On 30/03/2026 10:26, Janis Papanagnou wrote:
    On 2026-03-28 14:05, DFS wrote:
    On 3/28/2026 1:59 AM, Janis Papanagnou wrote:
    [...]

    But okay. When
    I read about the tasks to implement the first thought that came
    up was to use an appropriate language or tool-set, one that fits
    better for the task, tasks that at least I consider annoying to
    implement them in "C" because that language doesn't support it
    well, because of C's primitivity (its low-level'ness). But okay;
    we're in a C-group and the residents need feeding. - Why is it
    that I consider it annoying in "C"? - Because I'd have liked to
    implement such tasks based on existing _building blocks_; like
    associative arrays, sensible array data types, and what not.
    Instead of constructing and building a car with tools like an
    axe and a stone, wouldn't it be a more sensible to create useful
    tools in "C" to make such challenges concentrate more on the
    actual problem than on how to reinvent the simplest tasks again
    and again? - I'd certainly consider it worthwhile to challenge
    implementations of building blocks that alleviate C-programmers
    from all the boring error-prone and low-level tasks that are
    celebrated ad nauseam. - The question I'd ask myself if faced
    with (arbitrary or useful) requirements would be what elementary
    functions I'd need to construct the solution. Such identified
    and isolated features, i.e. their implementation, would have a
    persistent value for more than a single arbitrary "C"
    challenge.


    I think one word would've sufficed where you used 235: python

    Sorry, I cannot associate that statement with anything I said. -
    What is that "235: python" referring to? - Mind to elaborate?

    The 235 refers to the number of words in your paragraph (I haven't
    checked).


    I did. There are 223 words.
    So, now I have more interesting question - how did DFS come to
    number 235? If by eye sight - it's impressively precise. If by use
    of word count utility - it's too imprecise.

    OK, now I have to count them! If I use 'wc' on the original paragraph
    that JP wrote, which starts like this:

    Some requirements appear to be quite arbitrary. But okay. ...

    Then it says 230 words. But there was also another line before that paragraph which was this:

    Ruminations on the recent "C" challenges...

    If that is included, then 'wc' reports 236 words. (It's also possible
    that DFS mistyped the value.)

    Presumably your count starts from 'But okay;'; then I get 223 words
    too.



    Yes, I took paragraph as quoted by DFS. Now I see that in original post
    the paragraph was longer.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Tue Mar 31 13:07:48 2026
    Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

    On 3/31/2026 5:11 AM, Michael S wrote:

    I did. There are 223 words.
    So, now I have more interesting question - how did DFS come to number
    235? If by eye sight - it's impressively precise. If by use of word
    count utility - it's too imprecise.


    Starting with "But okay. When", I counted on my fingers while moving my
    lips. Lost count several times before I dropped it into Notepad++ and
    did View | Summary.





    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Tue Mar 31 21:15:45 2026
    Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

    On Tue, 31 Mar 2026 13:07:48 -0400
    DFS <nospam@dfs.com> wrote:

    On 3/31/2026 5:11 AM, Michael S wrote:

    I did. There are 223 words.
    So, now I have more interesting question - how did DFS come to
    number 235? If by eye sight - it's impressively precise. If by use
    of word count utility - it's too imprecise.


    Starting with "But okay. When", I counted on my fingers while moving
    my lips. Lost count several times before I dropped it into Notepad++
    and did View | Summary.





    Now I know that Notepad++ has View | Summary. Thank you.



    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Tue Mar 31 16:59:41 2026
    Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

    On 3/31/2026 2:15 PM, Michael S wrote:
    On Tue, 31 Mar 2026 13:07:48 -0400
    DFS <nospam@dfs.com> wrote:

    On 3/31/2026 5:11 AM, Michael S wrote:

    I did. There are 223 words.
    So, now I have more interesting question - how did DFS come to
    number 235? If by eye sight - it's impressively precise. If by use
    of word count utility - it's too imprecise.


    Starting with "But okay. When", I counted on my fingers while moving
    my lips. Lost count several times before I dropped it into Notepad++
    and did View | Summary.





    Now I know that Notepad++ has View | Summary. Thank you.


    But if I use Notepad++ and replace every space with a \n I get 223
    words. Difference of 12. Strange.

    Google AI Mode says:

    "Notepad++'s View | Summary (or double-clicking the status bar) is known
    to produce inaccurate word counts because it uses a simplified algorithm
    that often misinterprets punctuation, special characters, and encodings
    as word boundaries. It is widely considered "totally wrong" for precise
    work.

    Recommended Workarounds
    For an accurate word count, use these more reliable methods:

    Regex Count (Most Accurate):
    Press Ctrl + F and go to the Mark or Find tab.
    In Find what, type: \w+ (this matches alphanumeric word characters).
    Set the Search Mode to Regular expression.
    Click Count (or Mark All). The accurate word count will appear in the
    status bar of that window.

    Counting Selected Text Only:
    To count a specific section, highlight the text and follow the Regex
    Count steps above, making sure to check the In selection box.

    Plugins:
    NppTextFX2: This updated plugin provides a dedicated "Word Count" tool
    under TextFX > TextFX Tools.

    PythonScript: Advanced users can use a script (like StatusBarWordCount)
    to display a live, accurate count in the status bar.

    Why "Summary" is Inaccurate
    Encoding Issues: It may miscount characters in specific encodings like
    UCS-2.

    Word Definition: Unlike a full word processor, the Summary feature's
    basic definition of a "word" often fails to handle contractions (like
    "don't") or hyphenated words correctly.

    Hidden Spaces: It sometimes overcounts by treating multiple spaces or
    line returns as extra word breaks."



    Overcounting by 12 from a 223-word paragraph is ridiculously wrong. I'm surprised, since Notepad++ is otherwise a great editor.

    Note: if I use the AI suggestion of "Regex Count", it also says 235 words.

    223 it is.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michael S@3:633/10 to All on Wed Apr 1 01:02:11 2026
    Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

    On Tue, 31 Mar 2026 16:59:41 -0400
    DFS <nospam@dfs.com> wrote:

    On 3/31/2026 2:15 PM, Michael S wrote:
    On Tue, 31 Mar 2026 13:07:48 -0400
    DFS <nospam@dfs.com> wrote:

    On 3/31/2026 5:11 AM, Michael S wrote:

    I did. There are 223 words.
    So, now I have more interesting question - how did DFS come to
    number 235? If by eye sight - it's impressively precise. If by use
    of word count utility - it's too imprecise.


    Starting with "But okay. When", I counted on my fingers while
    moving my lips. Lost count several times before I dropped it into
    Notepad++ and did View | Summary.





    Now I know that Notepad++ has View | Summary. Thank you.


    But if I use Notepad++ and replace every space with a \n I get 223
    words. Difference of 12. Strange.

    Google AI Mode says:

    "Notepad++'s View | Summary (or double-clicking the status bar) is
    known to produce inaccurate word counts because it uses a simplified algorithm that often misinterprets punctuation, special characters,
    and encodings as word boundaries. It is widely considered "totally
    wrong" for precise work.

    Recommended Workarounds
    For an accurate word count, use these more reliable methods:

    Regex Count (Most Accurate):
    Press Ctrl + F and go to the Mark or Find tab.
    In Find what, type: \w+ (this matches alphanumeric word characters).
    Set the Search Mode to Regular expression.
    Click Count (or Mark All). The accurate word count will appear in the
    status bar of that window.

    Counting Selected Text Only:
    To count a specific section, highlight the text and follow the Regex
    Count steps above, making sure to check the In selection box.

    Plugins:
    NppTextFX2: This updated plugin provides a dedicated "Word Count"
    tool under TextFX > TextFX Tools.

    PythonScript: Advanced users can use a script (like
    StatusBarWordCount) to display a live, accurate count in the status
    bar.

    Why "Summary" is Inaccurate
    Encoding Issues: It may miscount characters in specific encodings
    like UCS-2.

    Word Definition: Unlike a full word processor, the Summary feature's

    basic definition of a "word" often fails to handle contractions (like "don't") or hyphenated words correctly.

    Hidden Spaces: It sometimes overcounts by treating multiple spaces or
    line returns as extra word breaks."



    Overcounting by 12 from a 223-word paragraph is ridiculously wrong.
    I'm surprised, since Notepad++ is otherwise a great editor.

    Note: if I use the AI suggestion of "Regex Count", it also says 235
    words.

    223 it is.


    Now I had unknown that Notepad++ has View | Summary. Thank you.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From DFS@3:633/10 to All on Tue Mar 31 18:24:31 2026
    Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

    On 3/31/2026 6:02 PM, Michael S wrote:


    Now I had unknown that Notepad++ has View | Summary. Thank you.


    You're welcome.

    But it appears unreliable.





    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)