Forum: d0p3 BBS

Re: I think this could be an interesting challenge!

From Tristan Wibberley@3:633/10 to All on Sun Mar 22 23:02:04 2026

On 22/03/2026 14:38, DFS wrote:

---------------------
Objective
---------------------
deliver a C (and optional 2nd language) program that - from a large list
of unsorted words possibly containing duplicates - extracts 26 sets of
100 random and unique words that each begin with a letter of the English alphabet.

What random distribution, uniform?
Said distribution over the unique words or said distribution over the
original list?

pseudorandom?

--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tristan Wibberley@3:633/10 to All on Sun Mar 22 23:11:07 2026

On 22/03/2026 14:38, DFS wrote:

---------------------
Objective
---------------------
deliver a C (and optional 2nd language) program that - from a large list
of unsorted words possibly containing duplicates - extracts 26 sets of
100 random and unique words that each begin with a letter of the English alphabet.

By "extracts" do you mean to imply that instances of words are selected
with removal from the population rather than being returned for the
following selection events?

--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Sun Mar 22 19:14:38 2026

On 3/22/2026 7:02 PM, Tristan Wibberley wrote:

On 22/03/2026 14:38, DFS wrote:

---------------------
Objective
---------------------
deliver a C (and optional 2nd language) program that - from a large list
of unsorted words possibly containing duplicates - extracts 26 sets of
100 random and unique words that each begin with a letter of the English
alphabet.

What random distribution, uniform?
Said distribution over the unique words or said distribution over the original list?

pseudorandom?

I don't care about the uniformity of the distribution, as long as the
output is unique words, and you generate and use 2600+ random values
from a RNG.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Sun Mar 22 19:16:27 2026

On 3/22/2026 7:11 PM, Tristan Wibberley wrote:

On 22/03/2026 14:38, DFS wrote:

---------------------
Objective
---------------------
deliver a C (and optional 2nd language) program that - from a large list
of unsorted words possibly containing duplicates - extracts 26 sets of
100 random and unique words that each begin with a letter of the English
alphabet.

By "extracts" do you mean to imply that instances of words are selected
with removal from the population rather than being returned for the
following selection events?

Nothing is removed.

'Identify' is probably a better word than 'extract'.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tristan Wibberley@3:633/10 to All on Sun Mar 22 23:19:05 2026

On 22/03/2026 14:38, DFS wrote:

3) print the 2600 words you identify in column x row order in a grid of

ITYM "choose" rather than "identify".

"identify" means to judge something as being equivalent (for some
preferred equivalence relation) to a special reference or to elements of
a special set of references.

--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tristan Wibberley@3:633/10 to All on Sun Mar 22 23:21:43 2026

On 22/03/2026 14:38, DFS wrote:

You must call a RNG 2600+ times to build the list

ie you can't use the
random ordering of the input file to your advantage).�

The two are not the same, that is, the use of "ie" is wrong.

Which do you really require, or do you really require I satisfy the
conjunction of the two?

--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Sun Mar 22 19:41:54 2026

On 3/22/2026 7:21 PM, Tristan Wibberley wrote:

On 22/03/2026 14:38, DFS wrote:

You must call a RNG 2600+ times to build the list

ie you can't use the
random ordering of the input file to your advantage).

The two are not the same, that is, the use of "ie" is wrong.

I never said they were the same.

If you don't use a random number generator, you can just read in the
randomly sorted file and count words until you have your sets. That's
no fun, effort, or reward.

I did that during development, and it was super easy (but you still have
to deal with duplicates).

I made this one a little more difficult by requiring the usage of a RNG.

Which do you really require, or do you really require I satisfy the conjunction of the two?

I stated the requirement.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Sun Mar 22 23:53:36 2026

On 22/03/2026 14:38, DFS wrote:

---------------------
Objective
---------------------
deliver a C (and optional 2nd language) program that - from a large list
of unsorted words possibly containing duplicates - extracts 26 sets of
100 random and unique words that each begin with a letter of the English alphabet.

I've had a first go at this challenge, using a scripting language to get something working as a reference.

Not C, so that code is here:
https://github.com/sal55/langs/blob/master/dfs.q

The output it produced is this: https://github.com/sal55/langs/blob/master/output. (I think it is
missing the heading for challenge 3.) It took 0.35 seconds to write that
file.

I haven't looked at your version in detail but did notice the
line-counts (as I had to delete those lines for a previous reply).

Any solution I come up with in C (which may take a while!) will have to
use entirely different methods. I'm not interested in writing
hash-tables etc in C, I'm far too lazy. Probably it will be much longer
than yours.

One thing which is still not clear is how to choose the layout of the
final challenge. I assume the number of rows has to be a multiple of
100, but how to decide the columns? I went with 3 columns max as the
most practical.

(Probably my version will go wrong if there aren't at least 100 words
per letter in the input.)

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tristan Wibberley@3:633/10 to All on Mon Mar 23 00:05:11 2026

On 22/03/2026 23:14, DFS wrote:

On 3/22/2026 7:02 PM, Tristan Wibberley wrote:

On 22/03/2026 14:38, DFS wrote:

---------------------
Objective
---------------------
deliver a C (and optional 2nd language) program that - from a large list >>> of unsorted words possibly containing duplicates - extracts 26 sets of
100 random and unique words that each begin with a letter of the English >>> alphabet.

What random distribution, uniform?
Said distribution over the unique words or said distribution over the
original list?

pseudorandom?

I don't care about the uniformity of the distribution, as long as the
output is unique words, and you generate and use 2600+ random values
from a RNG.

I think you're unaware that I can predictably generate a sequence of
identical values when the distribution is free and your specification is satisfied by selecting with a distribution that prefers just one
indicatory value for a choice of word to the exclusion of all others.

You mention an RNG, I suppose then that you exclude pseudo-random
numbers because those are normally referred to as PRNGs and I understand
that RNG excludes them.

--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tristan Wibberley@3:633/10 to All on Mon Mar 23 00:09:37 2026

On 22/03/2026 23:41, DFS wrote:

On 3/22/2026 7:21 PM, Tristan Wibberley wrote:

On 22/03/2026 14:38, DFS wrote:

You must call a RNG 2600+ times to build the list

ie you can't use the
random ordering of the input file to your advantage).

The two are not the same, that is, the use of "ie" is wrong.

I never said they were the same.

I can do it with fewer than 2600 calls for some input files, I could do
it with 100 sometimes. "random" doesn't mean that you don't use the same sequence of just 100 determiners 26 times.

Since the

--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tristan Wibberley@3:633/10 to All on Mon Mar 23 00:10:01 2026

On 22/03/2026 23:41, DFS wrote:

On 3/22/2026 7:21 PM, Tristan Wibberley wrote:

On 22/03/2026 14:38, DFS wrote:

You must call a RNG 2600+ times to build the list

ie you can't use the
random ordering of the input file to your advantage).

The two are not the same, that is, the use of "ie" is wrong.

I never said they were the same.

I can do it with fewer than 2600 calls for some input files, I could do
it with 100 sometimes. "random" doesn't mean that you don't use the same sequence of just 100 determiners 26 times.

--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From John McCue@3:633/10 to All on Mon Mar 23 01:23:40 2026

DFS <nospam@dfs.com> wrote:

On 3/22/2026 1:29 PM, John McCue wrote:

DFS <nospam@dfs.com> wrote:
<snip>

---------------------
Word Source
---------------------
There's a huge unsorted word list here:

https://limewire.com/?referrer=pq7i8xx7p2

...which you can develop against.

Do I need to create an ID to get the list ?

I don't think so.

It didn't give me an ID or login when I uploaded them.

I just now uploaded it here: https://filebin.net/kkkyqw1ritefnw0f

Thanks, I was able to download it.

--
[t]csh(1) - "An elegant shell, for a more... civilized age."
- Paraphrasing Star Wars

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Sun Mar 22 23:40:36 2026

On 3/22/2026 7:53 PM, Bart wrote:

On 22/03/2026 14:38, DFS wrote:

---------------------
Objective
---------------------
deliver a C (and optional 2nd language) program that - from a large
list of unsorted words possibly containing duplicates - extracts 26
sets of 100 random and unique words that each begin with a letter of
the English alphabet.

I've had a first go at this challenge, using a scripting language to get something working as a reference.

Thanks for doing it.

Not C, so that code is here: https://github.com/sal55/langs/blob/master/dfs.q

slick. It's a powerful scripting language. Reading a text file in with
one line is nice. It's about 10 lines of C.

Did you look to python for inspiration when creating it?

Looks like line 16 is where you call a randomizer. If you put a counter
at line 17 what does it say after the program is run?

Is bounds a property of your list objects?

Is bounds a pair of numbers 0..length of list-1?

What generates your random values?

The output it produced is this: https://github.com/sal55/langs/blob/master/output. (I think it is
missing the heading for challenge 3.) It took 0.35 seconds to write that file.

Perfect!

You did it quick, too.

I haven't looked at your version in detail but did notice the
line-counts (as I had to delete those lines for a previous reply).

Any solution I come up with in C (which may take a while!) will

have to

use entirely different methods. I'm not interested in writing
hash-tables etc in C, I'm far too lazy. Probably it will be much longer
than yours.

You have to deliver C to get a chance at the prize.

And I like to see different approaches. The way I did it in C and
Python is similar, but Python makes it SO easy (one-line) to segregate
words by letter that I took the easy way out there.

One thing which is still not clear is how to choose the layout of the
final challenge. I assume the number of rows has to be a multiple of
100, but how to decide the columns?

Whatever rows * columns >= 2600 will work.

A row count divisible by 100 is suggested, but I just tried 'off
numbers' such as 262r x 10c with both my C and Python, and they work fine.

My solutions have rows and columns as command line arguments.
$ ./program words.txt 1000 3

I went with 3 columns max as the most practical.

That's fine. I have one of those ultra-wide-screen monitors, so I can
see up to 13 columns.

(Probably my version will go wrong if there aren't at least 100 words
per letter in the input.)

Same here, but I didn't test that.

The lowest number of words per letter in that input file is 446 (for
letter x).

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Mon Mar 23 00:06:21 2026

On 3/22/2026 8:05 PM, Tristan Wibberley wrote:

On 22/03/2026 23:14, DFS wrote:

On 3/22/2026 7:02 PM, Tristan Wibberley wrote:

On 22/03/2026 14:38, DFS wrote:

---------------------
Objective
---------------------
deliver a C (and optional 2nd language) program that - from a large list >>>> of unsorted words possibly containing duplicates - extracts 26 sets of >>>> 100 random and unique words that each begin with a letter of the English >>>> alphabet.

What random distribution, uniform?
Said distribution over the unique words or said distribution over the
original list?

pseudorandom?

I don't care about the uniformity of the distribution, as long as the
output is unique words, and you generate and use 2600+ random values
from a RNG.

I think you're unaware that I can predictably generate a sequence of identical values when the distribution is free and your specification is satisfied by selecting with a distribution that prefers just one
indicatory value for a choice of word to the exclusion of all others.

Yeah, I don't really know what any of that means. But it sounds like
your 3rd attempt to sidestep the generation and use of 2600+ random values.

I think you could show some interesting techniques, but you have to
adhere to the requirements of the challenge.

You mention an RNG, I suppose then that you exclude pseudo-random
numbers because those are normally referred to as PRNGs and I understand
that RNG excludes them.

I always understood PRNGs and CSPRNGs to be subsets of RNGs. But use
what you like.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Mon Mar 23 10:53:10 2026

On Sun, 22 Mar 2026 23:21:43 +0000
Tristan Wibberley <tristan.wibberley+netnews2@alumni.manchester.ac.uk>
wrote:

On 22/03/2026 14:38, DFS wrote:

You must call a RNG 2600+ times to build the list

ie you can't use the
random ordering of the input file to your advantage).?

The two are not the same, that is, the use of "ie" is wrong.

Which do you really require, or do you really require I satisfy the conjunction of the two?

Do you try to hint that challenges with seemingly arbitrary rules and
seemingly arbitrary purposes are not very worthy?
If yes, then you could as well say it directly.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Mon Mar 23 10:56:34 2026

On 3/23/2026 4:53 AM, Michael S wrote:

Do you try to hint that challenges with seemingly arbitrary rules and seemingly arbitrary purposes are not very worthy?

Arbitrary and worth are in the eyes of the beholder.

So keep your arbitrary, worthless opinions to yourself.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Mon Mar 23 16:03:26 2026

On 23/03/2026 03:40, DFS wrote:

On 3/22/2026 7:53 PM, Bart wrote:

Not C, so that code is here: https://github.com/sal55/langs/blob/
master/dfs.q

slick.� It's a powerful scripting language.� Reading a text file in with
one line is nice.� It's about 10 lines of C.

Well, it can be one line in C too, once you create a function for it!

Did you look to python for inspiration when creating it?

No. I glanced at it but all I remember is that it was 58 lines.

Looks like line 16 is where you call a randomizer.� If you put a counter
at line 17 what does it say after the program is run?

It's called 2631 times. With a different seed, it will vary.

Is bounds a property of your list objects?

Is bounds a pair of numbers 0..length of list-1?

Yes, but the bounds usually start from 1. And here, the 'long' and
short' lists have bounds of 'a' to 'z' (97 to 122).

What generates your random values?

I use the PRNG shown below (not C code, and not mine).

There are a couple of levels of functions on top. The range-based
'random()' in the scripting language probably gives slightly biased
results, but none of my stuff including this is critical.

Any solution I come up with in C (which may take a while!) will

have to

use entirely different methods. I'm not interested in writing hash-
tables etc in C, I'm far too lazy. Probably it will be much longer
than yours.

You have to deliver C to get a chance at the prize.

I decided to do it in my 'M' language first as there are fewer i's and
t's to dot and cross when developing an algorithm.

That part's been done, now all that remains is manual porting to C. I
will do that later. (Auto-transpiling to C works, but I guess that's not
the kind of C you want.)

(If interested, my version is here; it's about 160 lines: https://github.com/sal55/langs/blob/master/dfs.m. I had planned to use
C's qsort(), but that didn't seem to work, so it includes a sort routine.)

This version produces the output in 0.30 seconds.

BTW the challenge has proved useful as it showed up bugs in both my
scripting language and the compiled one. The first has been fixed, the
second will be; I used the previous compiler version to test the code.

---------------------
[2]int seed = (0x2989'8811'1111'1272',0x1673'2673'7335'8264)

export func mrandom:u64 =
int x, y
x := seed[1]
y := seed[2]
seed[1] := y
x ixor:= x<<23
seed[2] := x ixor y ixor x>>17 ixor y>>26
return seed[2] + y
end

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Mon Mar 23 12:26:44 2026

On 3/23/2026 4:53 AM, Michael S wrote:

arbitrary purposes > not very worthy

The purposes are to extensively challenge your educational background
and your data processing skills with C.

Can you and your code:

* use the alphabet?

* count to 100?

* count words by first letter?

* find duplicate words and proper case them?

* handle duplicate random numbers?

* generate a unique set of words?

* print sorted output by column then row?

Well can you?

Are you worthy?

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Mon Mar 23 22:26:17 2026

On 23/03/2026 03:40, DFS wrote:

On 3/22/2026 7:53 PM, Bart wrote:

I haven't looked at your version in detail but did notice the
line-counts (as I had to delete those lines for a previous reply).

Any solution I come up with in C (which may take a while!) will

have to

use entirely different methods. I'm not interested in writing hash-
tables etc in C, I'm far too lazy. Probably it will be much longer
than yours.

You have to deliver C to get a chance at the prize.

And I like to see different approaches.� The way I did it in C and
Python is similar, but Python makes it SO easy (one-line) to segregate
words by letter that I took the easy way out there.

I now have a C version, a bit long to post so is at this link:

https://github.com/sal55/langs/blob/master/dfs.c

It looks very clunky but seems to do the job, and not too slowly either
(see below).

I then tried yours, which is somewhat shorter (160 lines vs my 205
lines, which includes blanks etc).

However, that doesn't seem to do part (2) of the challenge. While that
doesn't explicity say the unsorted duplicates must be shown, that's what
the example does:

found: eventually dupes you get
output: Dupes Eventually Get You

Your C program (I see the Python does it too) shows the equivalent of this:

Duplicate words in proper case
Dupes Eventually Get You

Now, I noticed that my original M version displayed that first 'found'
like, but the words were sorted, not unsorted! Displaying the original
order involved quite a bit of extra work, and an extra copy of the
word-list. The method is also inefficient.

So, is that necessary, or not? If not then I can simplify my versions.

Anyway, my C version does absolutely nothing clever. Everything is a
linear search. The only hi-tech bit is the quicksort routine.

Timing, all run under Windows:

My C: 0.30 seconds
Your C: 0.25 seconds

My Q: 0.34 seconds
Your Python: 1.77 seconds (CPython)
0.88 seconds (PyPy)

The C timings are unoptimised; optimising might knock off 0.01 or 0.02 seconds.

I don't know why the Python timing is slow, especially given that its
sort() routine will be internal native code function, and mine runs as bytecode.

My interpreters generally are faster than CPython at executing bytecode,
but with tasks like this, most time is usually spent within internal
native code libraries.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Tue Mar 24 00:43:34 2026

DFS <nospam@dfs.com> writes:

On 3/22/2026 1:29 PM, John McCue wrote:

DFS <nospam@dfs.com> wrote:
<snip>

---------------------
Word Source
---------------------
There's a huge unsorted word list here:

https://limewire.com/?referrer=pq7i8xx7p2

...which you can develop against.

Do I need to create an ID to get the list ?

I don't think so.

It didn't give me an ID or login when I uploaded them.

I just now uploaded it here: https://filebin.net/kkkyqw1ritefnw0f

A fucking web page. How about a link to a plain text file
that has just the words?

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Tue Mar 24 04:38:11 2026

On 3/24/2026 3:43 AM, Tim Rentsch wrote:

DFS <nospam@dfs.com> writes:

On 3/22/2026 1:29 PM, John McCue wrote:

DFS <nospam@dfs.com> wrote:
<snip>

---------------------
Word Source
---------------------
There's a huge unsorted word list here:

https://limewire.com/?referrer=pq7i8xx7p2

...which you can develop against.

Do I need to create an ID to get the list ?

I don't think so.

It didn't give me an ID or login when I uploaded them.

I just now uploaded it here: https://filebin.net/kkkyqw1ritefnw0f

A fucking web page. How about a link to a plain text file
that has just the words?

Just fucking click on the fucking file name.

;)

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tristan Wibberley@3:633/10 to All on Tue Mar 24 11:03:12 2026

On 23/03/2026 04:06, DFS wrote:

On 3/22/2026 8:05 PM, Tristan Wibberley wrote:

On 22/03/2026 23:14, DFS wrote:

On 3/22/2026 7:02 PM, Tristan Wibberley wrote:

On 22/03/2026 14:38, DFS wrote:

---------------------
Objective
---------------------
deliver a C (and optional 2nd language) program that - from a large
list
of unsorted words possibly containing duplicates - extracts 26 sets of >>>>> 100 random and unique words that each begin with a letter of the
English
alphabet.

What random distribution, uniform?
Said distribution over the unique words or said distribution over the
original list?

pseudorandom?

I don't care about the uniformity of the distribution, as long as the
output is unique words, and you generate and use 2600+ random values
from a RNG.

I think you're unaware that I can predictably generate a sequence of
identical values when the distribution is free and your specification is
satisfied by selecting with a distribution that prefers just one
indicatory value for a choice of word to the exclusion of all others.

Yeah, I don't really know what any of that means.� But it sounds like
your 3rd attempt to sidestep the generation and use of 2600+ random values.

I think you could show some interesting techniques, but you have to
adhere to the requirements of the challenge.

It's because of my deeper understanding of the meaning (or barely meaningfulness) of the word "random" and my awareness of how critical it
is to many applications of randomness.

That is:

- If it's a game I could go ahead with a PRNG and satisfy you easily -
but it's not interesting to me these days, at this point I think it's a
game,

- If it's a secure application of choice that leaks /no/ information
about the input list beyond the fact of the achievement of the
lower-bound, respectively, on the number of words having each initial, I
can tighten your specification in some of the ways I've queried,

- If it's a secure application of choice that may leak some information
about the input list but may not leak any of the information of the
original ordering of the words (which was implied to be the reason for
the minimum number of queries for random numbers) then I can use fewer
bits of entropy, saving runtime costs. This 3rd possible endeavour is
less relevant now that you allow a PRNG because I don't have to care
about the cost of bits of entropy or their turnaround time. It's still
an interesting one when considering the nature of the task of
requirements engineering and agreeing requirements. Programmes can fail
due to uncompetitiveness induced by individual member projects with
unnecessary or insufficient requirements.

More than that, though, queries for random numbers may come in
individual bits and an implementation might query 16 numbers, for
example, for each word choice, rather than one. And it goes deeper than
that. That means the request to query for 2600 numbers is sort of
meaningless and can lead to programme failure by being in the class of unnecessary itself and its presence leading to the other requirements
being insufficient.

More still, people get gambling games wrong and go to jail because they
learn and practice C programming without any awareness of the difficulty
of "random" and they might read this newsgroup to shape their skills.

So you see, each of my questions were properly important for many reasons.

I really did think about it carefully.

--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tristan Wibberley@3:633/10 to All on Tue Mar 24 11:04:43 2026

On 23/03/2026 14:56, DFS wrote:

On 3/23/2026 4:53 AM, Michael S wrote:

Do you try to hint that challenges with seemingly arbitrary rules and
seemingly arbitrary purposes are not very worthy?

Arbitrary and worth are in the eyes of the beholder.

So keep your arbitrary, worthless opinions to yourself.

Indeed it's a very important and /extremely/ interesting challenge, for
reasons I state near the end of another post I recently made.

--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Tue Mar 24 12:58:02 2026

On 24/03/2026 11:03, Tristan Wibberley wrote:

On 23/03/2026 04:06, DFS wrote:

On 3/22/2026 8:05 PM, Tristan Wibberley wrote:

On 22/03/2026 23:14, DFS wrote:

On 3/22/2026 7:02 PM, Tristan Wibberley wrote:

On 22/03/2026 14:38, DFS wrote:

---------------------
Objective
---------------------
deliver a C (and optional 2nd language) program that - from a large >>>>>> list
of unsorted words possibly containing duplicates - extracts 26 sets of >>>>>> 100 random and unique words that each begin with a letter of the
English
alphabet.

What random distribution, uniform?
Said distribution over the unique words or said distribution over the >>>>> original list?

pseudorandom?

I don't care about the uniformity of the distribution, as long as the
output is unique words, and you generate and use 2600+ random values
from a RNG.

I think you're unaware that I can predictably generate a sequence of
identical values when the distribution is free and your specification is >>> satisfied by selecting with a distribution that prefers just one
indicatory value for a choice of word to the exclusion of all others.

Yeah, I don't really know what any of that means.� But it sounds like
your 3rd attempt to sidestep the generation and use of 2600+ random values. >>
I think you could show some interesting techniques, but you have to
adhere to the requirements of the challenge.

It's because of my deeper understanding of the meaning (or barely meaningfulness) of the word "random" and my awareness of how critical it
is to many applications of randomness.

That is:

- If it's a game I could go ahead with a PRNG and satisfy you easily -
but it's not interesting to me these days, at this point I think it's a
game,

- If it's a secure application of choice that leaks /no/ information
about the input list beyond the fact of the achievement of the
lower-bound, respectively, on the number of words having each initial, I
can tighten your specification in some of the ways I've queried,

- If it's a secure application of choice that may leak some information about the input list but may not leak any of the information of the
original ordering of the words (which was implied to be the reason for
the minimum number of queries for random numbers) then I can use fewer
bits of entropy, saving runtime costs. This 3rd possible endeavour is
less relevant now that you allow a PRNG because I don't have to care
about the cost of bits of entropy or their turnaround time. It's still
an interesting one when considering the nature of the task of
requirements engineering and agreeing requirements. Programmes can fail
due to uncompetitiveness induced by individual member projects with unnecessary or insufficient requirements.

More than that, though, queries for random numbers may come in
individual bits and an implementation might query 16 numbers, for
example, for each word choice, rather than one. And it goes deeper than
that. That means the request to query for 2600 numbers is sort of
meaningless and can lead to programme failure by being in the class of unnecessary itself and its presence leading to the other requirements
being insufficient.

More still, people get gambling games wrong and go to jail because they
learn and practice C programming without any awareness of the difficulty
of "random" and they might read this newsgroup to shape their skills.

So you see, each of my questions were properly important for many reasons.

I really did think about it carefully.

Every time random numbers come up in this group then suddenly everybody
is an expert and no PRNG except the most perfect and secure will do,
what matter what the application. Ideally a true RNG.

This is a just a fun programming exercise. You can assume some function 'rand()' that returns good-enough values. So long as it doesn't simply
return the same value each time, or 1,2,3,...; the bar needn't be high!

Actually, I have just tried a PRNG that returns 1,2,3,.. on my attempt,
and I couldn't really tell from the output that there was anything
wrong. Thanks to the input being unsorted anyway.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Tue Mar 24 14:02:49 2026

DFS <nospam@dfs.com> writes:

On 3/24/2026 3:43 AM, Tim Rentsch wrote:

DFS <nospam@dfs.com> writes:

On 3/22/2026 1:29 PM, John McCue wrote:

DFS <nospam@dfs.com> wrote:
<snip>

---------------------
Word Source
---------------------
There's a huge unsorted word list here:

https://limewire.com/?referrer=pq7i8xx7p2

...which you can develop against.

Do I need to create an ID to get the list ?

I don't think so.

It didn't give me an ID or login when I uploaded them.

I just now uploaded it here: https://filebin.net/kkkyqw1ritefnw0f

A fucking web page. How about a link to a plain text file
that has just the words?

Just fucking click on the fucking file name.

Not everyone reads usenet with a browser or a news client
that understands hypertext or the hypertext transfer protocol.

I would generally have used 'wget' to fetch, so if you'd
specified:

https://filebin.net/kkkyqw1ritefnw0f/words_unsorted.txt

That may have been slightly better, but it appears that
filebin interposes a warning screen and forces a second
click, so wget may also have failed.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Tue Mar 24 12:03:52 2026

On 3/24/2026 10:02 AM, Scott Lurndal wrote:

DFS <nospam@dfs.com> writes:

On 3/24/2026 3:43 AM, Tim Rentsch wrote:

DFS <nospam@dfs.com> writes:

On 3/22/2026 1:29 PM, John McCue wrote:

DFS <nospam@dfs.com> wrote:
<snip>

---------------------
Word Source
---------------------
There's a huge unsorted word list here:

https://limewire.com/?referrer=pq7i8xx7p2

...which you can develop against.

Do I need to create an ID to get the list ?

I don't think so.

It didn't give me an ID or login when I uploaded them.

I just now uploaded it here: https://filebin.net/kkkyqw1ritefnw0f

A fucking web page. How about a link to a plain text file
that has just the words?

Just fucking click on the fucking file name.

Not everyone reads usenet with a browser or a news client
that understands hypertext or the hypertext transfer protocol.

Sucks for them.

I would generally have used 'wget' to fetch, so if you'd
specified:

https://filebin.net/kkkyqw1ritefnw0f/words_unsorted.txt

That may have been slightly better, but it appears that
filebin interposes a warning screen and forces a second
click, so wget may also have failed.

$wget -r -np https://filebin.net/kkkyqw1ritefnw0f/words_unsorted.txt

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Tue Mar 24 13:06:53 2026

On 3/23/2026 6:26 PM, Bart wrote:

On 23/03/2026 03:40, DFS wrote:

On 3/22/2026 7:53 PM, Bart wrote:

I haven't looked at your version in detail but did notice the

line-counts (as I had to delete those lines for a previous reply).

Any solution I come up with in C (which may take a while!) will

have to

use entirely different methods. I'm not interested in writing hash-
tables etc in C, I'm far too lazy. Probably it will be much longer
than yours.

You have to deliver C to get a chance at the prize.

And I like to see different approaches.� The way I did it in C and
Python is similar, but Python makes it SO easy (one-line) to segregate
words by letter that I took the easy way out there.

I now have a C version, a bit long to post so is at this link:

�https://github.com/sal55/langs/blob/master/dfs.c

It looks very clunky but seems to do the job, and not too slowly either
(see below).

Years ago I was shocked how fast C chewed thru text data (and it's even
faster dealing with numbers).

Actually, I'm still shocked. I wrote an anagram program in C that used
prime factors to do searches, and it found 5 anagrams from a list of
370K words in 0.0055s (5.5/1000ths of a second).

And it would be even faster with the use of a hash table. Incredible.

And that's on my low-end AMD Ryzen 5600G (16GB DDR4-3200 RAM)

I then tried yours, which is somewhat shorter (160 lines vs my 205
lines, which includes blanks etc).

However, that doesn't seem to do part (2) of the challenge. While that doesn't explicity say the unsorted duplicates must be shown, that's what
the example does:

�� found:� eventually dupes you get
�� output: Dupes Eventually Get You

Your C program (I see the Python does it too) shows the equivalent of this:

� Duplicate words in proper case
� Dupes Eventually Get You

Now, I noticed that my original M version displayed that first 'found'
like, but the words were sorted, not unsorted! Displaying the original
order involved quite a bit of extra work,

I see I wasn't clear. The 'found' output wasn't a requirement, just an example to show what the duplicates might look like unsorted.

and an extra copy of the > word-list. The method is also inefficient.

So, is that necessary, or not? If not then I can simplify my versions.

No extra copy of the list is necessary to find duplicates (but for
one-pass efficiency, sorting the list is required).

Look at the first letter of each duplicate.

"congratulations on the wherewithal youngun"
cotwy

Sort the file and the dupes are already sorted. That was intentional.

If that explanation lets you drop some lines, good.

My method of finding the 26 sets was to:

1) count words by letter as the file is read in
lettercnt[wordsin[i][0]-'a']++;

(I saw something similar in your scripting, but couldn't spot it in your .c)

2) sort the data just read in
qsort(wordsin, wordcnt, sizeof(char*), comparechar);

3) using the lettercnt[] array from step 1, determine the start-end
positions of each set of words beginning with a..z

Letter Start End
a 0 20484
b 20485 34475
c 34476 60069
d 60070 75050
e 75051 86572
f 86573 95977
g 95978 104985
h 104986 116490
i 116491 127653
j 127654 129796
k 129797 132749
l 132750 140949
m 140950 157658
n 157659 166088
o 166089 175859
p 175860 205604
q 205605 207078
r 207079 221162
s 221163 253678
t 253679 269769
u 269770 287936
v 287937 292502
w 292503 297884
x 297885 298330
y 298331 299249
z 299250 300397

4) generate 100+ random numbers between start and end of each letter
int r = (rand() % (end - start + 1)) + start;

This 'calculation of start and end' for each letter is what I thought to
be a novel approach.

I'm curious how others will approach it (if anyone else tries).

Altogether my program makes:
3 passes thru the 300398 words in:
* 1 to count total words and words by letter
* 1 to load the words into an array
* 1 to find duplicates

2 passes thru the 2600 words out:
* 1 to verify the 100 words per letter
* 1 to print all 2600 words

5 total passes? Not sure that's Ivy League. But everything runs in
1/10th of a second so I can't complain.

Anyway, my C version does absolutely nothing clever. Everything is a
linear search.

s'alright.

I rejiggered my code, so main() is like yours:

int main(int argc, char *argv[]) {
validateinput(argc, argv);
loadwords(argv);
buildwordsets();

printcountsbyletter();
printduplicatewords();
print2600words(argv);

return 0;
}

Somehow it's consistently 0.003s faster! winning!

The only hi-tech bit is the quicksort routine.

I saw that. Nice.

Timing, all run under Windows:

� My C:�� 0.30 seconds
� Your C:�� 0.25 seconds

� My Q:�� 0.34 seconds
� Your Python:� 1.77 seconds (CPython)
�� 0.88 seconds (PyPy)

The C timings are unoptimised; optimising might knock off 0.01 or 0.02 seconds.

I don't know why the Python timing is slow, especially given that its
sort() routine will be internal native code function, and mine runs as bytecode.

I know!

* on WSL Ubuntu my Python runs 10x slower than the C
* on WSL Ubuntu my Python runs at about the same speed as on Windows

Both speeds are very unusual, and very slow.

Windows: Python 3.11.0
WSL : Python 3.10.6

My interpreters generally are faster than CPython at executing bytecode,
but with tasks like this, most time is usually spent within internal
native code libraries.

Thanks for trying the challenge. Hopefully some others will.

If you have a short challenge of medium difficulty, post it so we can
learn and improve skillz.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Tue Mar 24 13:10:33 2026

On 3/23/2026 12:03 PM, Bart wrote:

On 23/03/2026 03:40, DFS wrote:

On 3/22/2026 7:53 PM, Bart wrote:

Not C, so that code is here: https://github.com/sal55/langs/blob/
master/dfs.q

slick.� It's a powerful scripting language.� Reading a text file in
with one line is nice.� It's about 10 lines of C.

Well, it can be one line in C too, once you create a function for it!

Did you look to python for inspiration when creating it?

No. I glanced at it but all I remember is that it was 58 lines.

I don't mean my little bit of code. I mean did you look to the python language for inspirations when you were developing your scripting language?

Looks like line 16 is where you call a randomizer.� If you put a
counter at line 17 what does it say after the program is run?

It's called 2631 times. With a different seed, it will vary.

That's about what I was expecting.

Is bounds a property of your list objects?

Is bounds a pair of numbers 0..length of list-1?

Yes, but the bounds usually start from 1. And here, the 'long' and
short' lists have bounds of 'a' to 'z' (97 to 122).

What generates your random values?

I use the PRNG shown below (not� C code, and not mine).

There are a couple of levels of functions on top. The range-based
'random()' in the scripting language probably gives slightly biased
results, but none of my stuff including this is critical.

Any solution I come up with in C (which may take a while!) will

have to

use entirely different methods. I'm not interested in writing hash-
tables etc in C, I'm far too lazy. Probably it will be much longer
than yours.

You have to deliver C to get a chance at the prize.

I decided to do it in my 'M' language first as there are fewer i's and
t's to dot and cross when developing an algorithm.

You have separate M and Q languages?

That part's been done, now all that remains is manual porting to C. I
will do that later. (Auto-transpiling to C works, but I guess that's not
the kind of C you want.)

Probably not.

(If interested, my version is here; it's about 160 lines: https://github.com/sal55/langs/blob/master/dfs.m. I had planned to use
C's qsort(), but that didn't seem to work, so it includes a sort routine.)

This version produces the output in 0.30 seconds.

Why wouldn't qsort() work?

BTW the challenge has proved useful as it showed up bugs in both my scripting language and the compiled one. The first has been fixed, the second will be; I used the previous compiler version to test the code.

Awesome.

---------------------
[2]int seed = (0x2989'8811'1111'1272',0x1673'2673'7335'8264)

export func mrandom:u64 =
�� int x, y
�� x := seed[1]
�� y := seed[2]
�� seed[1] := y
�� x ixor:= x<<23
�� seed[2] := x ixor y ixor x>>17 ixor y>>26
�� return seed[2] + y
end

Do you have a C version of that?

If so I'll run it against a RNG comparison program I wrote.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Tue Mar 24 13:17:27 2026

On 3/24/2026 7:03 AM, Tristan Wibberley wrote:

On 23/03/2026 04:06, DFS wrote:

On 3/22/2026 8:05 PM, Tristan Wibberley wrote:

On 22/03/2026 23:14, DFS wrote:

On 3/22/2026 7:02 PM, Tristan Wibberley wrote:

On 22/03/2026 14:38, DFS wrote:

---------------------
Objective
---------------------
deliver a C (and optional 2nd language) program that - from a large >>>>>> list
of unsorted words possibly containing duplicates - extracts 26 sets of >>>>>> 100 random and unique words that each begin with a letter of the
English
alphabet.

What random distribution, uniform?
Said distribution over the unique words or said distribution over the >>>>> original list?

pseudorandom?

I don't care about the uniformity of the distribution, as long as the
output is unique words, and you generate and use 2600+ random values
from a RNG.

I think you're unaware that I can predictably generate a sequence of
identical values when the distribution is free and your specification is >>> satisfied by selecting with a distribution that prefers just one
indicatory value for a choice of word to the exclusion of all others.

Yeah, I don't really know what any of that means.� But it sounds like
your 3rd attempt to sidestep the generation and use of 2600+ random values. >>
I think you could show some interesting techniques, but you have to
adhere to the requirements of the challenge.

It's because of my deeper understanding of the meaning (or barely meaningfulness) of the word "random" and my awareness of how critical it
is to many applications of randomness.

That is:

- If it's a game I could go ahead with a PRNG and satisfy you easily -
but it's not interesting to me these days, at this point I think it's a
game,

A game... now you're onto me.

- If it's a secure application of choice that leaks /no/ information
about the input list beyond the fact of the achievement of the
lower-bound, respectively, on the number of words having each initial, I
can tighten your specification in some of the ways I've queried,

I sense your "tightening" will result in an unreadable spec, but it
would be fun to try. So let's have it.

- If it's a secure application of choice that may leak some information about the input list but may not leak any of the information of the
original ordering of the words (which was implied to be the reason for
the minimum number of queries for random numbers) then I can use fewer
bits of entropy, saving runtime costs.

Costco has a good deal on entropy right now.

This 3rd possible endeavour is
less relevant now that you allow a PRNG because I don't have to care
about the cost of bits of entropy or their turnaround time. It's still
an interesting one when considering the nature of the task of
requirements engineering and agreeing requirements. Programmes can fail
due to uncompetitiveness induced by individual member projects with unnecessary or insufficient requirements.

More than that, though, queries for random numbers may come in
individual bits and an implementation might query 16 numbers, for
example, for each word choice, rather than one. And it goes deeper than
that. That means the request to query for 2600 numbers is sort of
meaningless and can lead to programme failure by being in the class of unnecessary itself and its presence leading to the other requirements
being insufficient.

I would agree that on a scale of necessary to unnecessary, this
challenge lies very close to unnecessary.

But it lies closer to the middle of the scale interesting..uninteresting.

I have a few more up my sleeve. One in particular I've been thinking
about, that explicitly disallows the use of a RNG.

More still, people get gambling games wrong and go to jail because they
learn and practice C programming without any awareness of the difficulty
of "random" and they might read this newsgroup to shape their skills.

You should consult an attorney - I wouldn't want you to do (rand() % 26)
+ 1 days in jail for reading clc and attempting my challenge.

So you see, each of my questions were properly important for many reasons.

I really did think about it carefully.

I appreciate your consideration.

ps you're nuts

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Tue Mar 24 12:39:46 2026

scott@slp53.sl.home (Scott Lurndal) writes:

DFS <nospam@dfs.com> writes:

On 3/24/2026 3:43 AM, Tim Rentsch wrote:

DFS <nospam@dfs.com> writes:

On 3/22/2026 1:29 PM, John McCue wrote:

DFS <nospam@dfs.com> wrote:
<snip>

---------------------
Word Source
---------------------
There's a huge unsorted word list here:

https://limewire.com/?referrer=pq7i8xx7p2

...which you can develop against.

Do I need to create an ID to get the list ?

I don't think so.

It didn't give me an ID or login when I uploaded them.

I just now uploaded it here: https://filebin.net/kkkyqw1ritefnw0f

A fucking web page. How about a link to a plain text file
that has just the words?

Just fucking click on the fucking file name.

Not everyone reads usenet with a browser or a news client
that understands hypertext or the hypertext transfer protocol.

I would generally have used 'wget' to fetch, so if you'd
specified:

https://filebin.net/kkkyqw1ritefnw0f/words_unsorted.txt

That may have been slightly better, but it appears that
filebin interposes a warning screen and forces a second
click, so wget may also have failed.

Thank you for this. It's nice to know someone here
understands.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Tue Mar 24 20:29:35 2026

On 24/03/2026 17:10, DFS wrote:

On 3/23/2026 12:03 PM, Bart wrote:

On 23/03/2026 03:40, DFS wrote:

On 3/22/2026 7:53 PM, Bart wrote:

Not C, so that code is here: https://github.com/sal55/langs/blob/
master/dfs.q

slick.� It's a powerful scripting language.� Reading a text file in
with one line is nice.� It's about 10 lines of C.

Well, it can be one line in C too, once you create a function for it!

Did you look to python for inspiration when creating it?

No. I glanced at it but all I remember is that it was 58 lines.

I don't mean my little bit of code.� I mean did you look to the python language for inspirations when you were developing your scripting language?

In that case, no. Both started around 1990, but I didn't look at Python
until a decade later.

The only feature I borrowed and still use was an 'else' clause for
for-loops. Plus I briefly had list-comps, but they fell into disuse. A
few other ideas were tried such as generators.

(See https://github.com/sal55/langs/blob/master/QLanguage/qbasics.md,
this is from 5 years ago.

Mine started as an add-on scripting language for my 3D graphics
applications, so was more of a DSL. Then it became independent.)

I decided to do it in my 'M' language first as there are fewer i's and
t's to dot and cross when developing an algorithm.

You have separate M and Q languages?

M is my systems language (somewhat higher level than C), and Q is my
scripting language (lower level than Python and less dynamic):

C--M-----------Q------------------Python

(If interested, my version is here; it's about 160 lines: https://
github.com/sal55/langs/blob/master/dfs.m. I had planned to use C's
qsort(), but that didn't seem to work, so it includes a sort routine.)

This version produces the output in 0.30 seconds.

Why wouldn't qsort() work?

It just give the wrong results. Maybe I called it wrong, but I couldn't
see how (I know the args to the compare function have an extra
indirection level). Maybe I'll it again later.

---------------------
[2]int seed = (0x2989'8811'1111'1272',0x1673'2673'7335'8264)

export func mrandom:u64 =
�� int x, y
�� x := seed[1]
�� y := seed[2]
�� seed[1] := y
�� x ixor:= x<<23
�� seed[2] := x ixor y ixor x>>17 ixor y>>26
�� return seed[2] + y
end

Do you have a C version of that?

If so I'll run it against a RNG comparison program I wrote.

Try this:

typedef unsigned long long u64;

u64 seed[2] = {0x2989881111111272ULL, 0x1673267373358264ULL};

u64 crandom() {
u64 x, y;
x = seed[0];
y = seed[1];
seed[0] = y;
x ^= x<<23;
seed[1] = x ^ y ^ (x>>17) ^ (y>>26);
return seed[1] + y;
}

I think this is just a '128-bit/xor/shift' method I saw somewhere online.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tristan Wibberley@3:633/10 to All on Tue Mar 24 22:18:32 2026

On 24/03/2026 16:03, DFS wrote:

On 3/24/2026 10:02 AM, Scott Lurndal wrote:

DFS <nospam@dfs.com> writes:

On 3/24/2026 3:43 AM, Tim Rentsch wrote:

DFS <nospam@dfs.com> writes:

On 3/22/2026 1:29 PM, John McCue wrote:

DFS <nospam@dfs.com> wrote:
<snip>

---------------------
Word Source
---------------------
There's a huge unsorted word list here:

https://limewire.com/?referrer=pq7i8xx7p2

...which you can develop against.

Do I need to create an ID to get the list ?

I don't think so.

It didn't give me an ID or login when I uploaded them.

I just now uploaded it here:� https://filebin.net/kkkyqw1ritefnw0f

A fucking web page.� How about a link to a plain text file
that has just the words?

Just fucking click on the fucking file name.

Not everyone reads usenet with a browser or a news client
that understands hypertext or the hypertext transfer protocol.

Sucks for them.

I would generally have used 'wget' to fetch, so if you'd
specified:

https://filebin.net/kkkyqw1ritefnw0f/words_unsorted.txt

That may have been slightly better, but it appears that
filebin interposes a warning screen and forces a second
click, so wget may also have failed.

$wget -r -np https://filebin.net/kkkyqw1ritefnw0f/words_unsorted.txt

HTTP is really the wrong URI scheme, it has various explicit provisions (providing implicit permission) for content to be modified and replaced
in transit for various compatibility and resource-consumption goals of intervening hosts. Even HTTPS doesn't solve that problem because it's
only a technical measure and only against illegal interposers. Legal interposers may be admitted by the technical measure's administrators.

ftp, sftp, ftps, rsync... these are where it's at for file content
rather than hypertext nodes. There's no implied right to substitute.

--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tristan Wibberley@3:633/10 to All on Wed Mar 25 00:29:44 2026

On 24/03/2026 17:17, DFS wrote:

On 3/24/2026 7:03 AM, Tristan Wibberley wrote:

On 23/03/2026 04:06, DFS wrote:

On 3/22/2026 8:05 PM, Tristan Wibberley wrote:

On 22/03/2026 23:14, DFS wrote:

On 3/22/2026 7:02 PM, Tristan Wibberley wrote:

On 22/03/2026 14:38, DFS wrote:

---------------------
Objective
---------------------
deliver a C (and optional 2nd language) program that - from a large >>>>>>> list
of unsorted words possibly containing duplicates - extracts 26
sets of
100 random and unique words that each begin with a letter of the >>>>>>> English
alphabet.

What random distribution, uniform?
Said distribution over the unique words or said distribution over the >>>>>> original list?

pseudorandom?

I don't care about the uniformity of the distribution, as long as the >>>>> output is unique words, and you generate and use 2600+ random values >>>>> from a RNG.

I think you're unaware that I can predictably generate a sequence of
identical values when the distribution is free and your
specification is
satisfied by selecting with a distribution that prefers just one
indicatory value for a choice of word to the exclusion of all others.

Yeah, I don't really know what any of that means.� But it sounds like
your 3rd attempt to sidestep the generation and use of 2600+ random
values.

I think you could show some interesting techniques, but you have to
adhere to the requirements of the challenge.

It's because of my deeper understanding of the meaning (or barely
meaningfulness) of the word "random" and my awareness of how critical it
is to many applications of randomness.

That is:

� - If it's a game I could go ahead with a PRNG and satisfy you easily -
but it's not interesting to me these days, at this point I think it's a
game,

A game... now you're onto me.

� - If it's a secure application of choice that leaks /no/ information
about the input list beyond the fact of the achievement of the
lower-bound, respectively, on the number of words having each initial, I
can tighten your specification in some of the ways I've queried,

I sense your "tightening" will result in an unreadable spec, but it
would be fun to try.� So let's have it.

I don't think it would be unreadable. It might require some thoughtto synthesise a program that satisfies it.

But you've told me its just a game (I suppose "toy", rather than
gambling game). So the interesting bit could be just like:

"produce the output so that, within each of the groupings based on the
initial letter, the words are superficially shuffled around even when
they're not shuffled around in the input."

I would agree that on a scale of necessary to unnecessary, this
challenge lies very close to unnecessary.

I didn't mean to suggest the challenge is unnecessary, but mean to
discuss the problems of writing specifications and requirements such
that they're not necessary to fulfil the goal. Requirements involving randomness and secrecy are particularly interesting in that respect.

But it lies closer to the middle of the scale interesting..uninteresting.

I have a few more up my sleeve.� One in particular I've been thinking
about, that explicitly disallows the use of a RNG.

More still, people get gambling games wrong and go to jail because they
learn and practice C programming without any awareness of the difficulty
of "random" and they might read this newsgroup to shape their skills.

You should consult an attorney - I wouldn't want you to do (rand() % 26)
+ 1 days in jail for reading clc and attempting my challenge.

You seem to be assuming every reader is just fulfilling a need for a
passtime. I don't suppose that and you seem to be mocking me for it; I
think that's awful. You'd got me really excited about the breadth of
nuance in requirements and the effects of that and then turned it into
an opportunity for mockery.

ps you're nuts

That may be, but how did you know?

--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Eli the Bearded@3:633/10 to All on Wed Mar 25 04:07:52 2026

In comp.lang.c, DFS <nospam@dfs.com> wrote:

I just now uploaded it here: https://filebin.net/kkkyqw1ritefnw0f

"word" list

$ grep ^.$ ~/tmp/words-unsorted |grep -v [aeiouy] |wc
19 19 38
$ grep ^..$ ~/tmp/words-unsorted |grep -v [aeiouy] |wc
54 54 162
$ grep ^...$ ~/tmp/words-unsorted |grep -v [aeiouy] |wc
74 74 296
$ grep ^....$ ~/tmp/words-unsorted |grep -v [aeiouy] |wc
13 13 65

Elijah
------
not going to use that for Scrabble

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Wed Mar 25 01:17:59 2026

On 3/25/2026 12:07 AM, Eli the Bearded wrote:

In comp.lang.c, DFS <nospam@dfs.com> wrote:

I just now uploaded it here: https://filebin.net/kkkyqw1ritefnw0f

"word" list

$ grep ^.$ ~/tmp/words-unsorted |grep -v [aeiouy] |wc
19 19 38
$ grep ^..$ ~/tmp/words-unsorted |grep -v [aeiouy] |wc
54 54 162
$ grep ^...$ ~/tmp/words-unsorted |grep -v [aeiouy] |wc
74 74 296
$ grep ^....$ ~/tmp/words-unsorted |grep -v [aeiouy] |wc
13 13 65

Elijah
------
not going to use that for Scrabble

$ ./wl words_unsorted.txt

Summary of words_unsorted.txt ---------------------------------------------------------------------------------------------------------------------------------------------------------
300398 words

First word in sorted list is 'a'
Last word in sorted list is 'zyzzogeton'
Longest word is 28 letters

Median word = mimosas

Word count by length
1. 25 2. 212 3. 1515 4. 5909 5. 16165 6. 22031
7. 31232 8. 38900 9. 41592 10. 39423 11. 32745 12. 25346
13. 18134 14. 11813 15. 7134 16. 4019 17. 2185 18. 1032
19. 534 20. 264 21. 106 22. 51 23. 21 24. 7
25. 1 26. 0 27. 1 28. 1

Word count by first letter
a. 20485 b. 13991 c. 25594 d. 14981 e. 11522 f. 9405
g. 9008
h. 11505 i. 11163 j. 2143 k. 2953 l. 8200 m. 16709
n. 8430
o. 9771 p. 29745 q. 1474 r. 14084 s. 32516 t. 16091
u. 18167
v. 4566 w. 5382 x. 446 y. 919 z. 1148

Letter frequency (total letters = 2852034)
e. 305117 (10.7%) i. 257590 ( 9.0%) a. 243274 ( 8.5%) o. 208453
( 7.3%)
r. 204403 ( 7.2%) s. 203387 ( 7.1%) n. 198749 ( 7.0%) t. 191543
( 6.7%)
l. 160027 ( 5.6%) c. 125678 ( 4.4%) u. 105257 ( 3.7%) p. 94327
( 3.3%)
d. 90187 ( 3.2%) m. 87628 ( 3.1%) h. 76769 ( 2.7%) g. 64686
( 2.3%)
y. 58025 ( 2.0%) b. 50478 ( 1.8%) f. 31172 ( 1.1%) v. 25817
( 0.9%)
k. 21030 ( 0.7%) w. 18053 ( 0.6%) z. 12976 ( 0.5%) x. 8486
( 0.3%)
q. 4724 ( 0.2%) j. 4080 ( 0.1%)

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Wed Mar 25 11:54:39 2026

On 24/03/2026 17:06, DFS wrote:

On 3/23/2026 6:26 PM, Bart wrote:

It looks very clunky but seems to do the job, and not too slowly
either (see below).

Years ago I was shocked how fast C chewed thru text data (and it's even faster dealing with numbers).

Actually, I'm still shocked.� I wrote an anagram program in C that used prime factors to do searches, and it found 5 anagrams from a list of
370K words in 0.0055s (5.5/1000ths of a second).

You're attributing too much to C. Or maybe comparing it too much to
Python which is very slow.

There are other factors: hardware today is incredibly fast (like 4
magnitudes or more faster than the 8-bit machines I started off on).

And a lot of it is due to the optimising compilers now available.

My own systems language is aso quite low-level. And can be just as fast
if someone were to write an optimising compiler for it too!

(As it is, it's not far off. Its self-hosted compiler can build over 20
new generations of itself per second, on a machine slower than yours.)

And it would be even faster with the use of a hash table.� Incredible.

And that's on my low-end AMD Ryzen 5600G (16GB DDR4-3200 RAM)

If that's low-end, what would be high-end? I mean in desktop computer
terms not some supercomputer.

No extra copy of the list is necessary to find duplicates (but for

one- > pass efficiency, sorting the list is required).

Look at the first letter of each duplicate.

"congratulations on the wherewithal youngun"
cotwy

Sort the file and the dupes are already sorted.� That was intentional.

If that explanation lets you drop some lines, good.

I'm now down to 150 sloc for the C version, and 125 sloc for the M version.

My method of finding the 26 sets was to:

1) count words by letter as the file is read in lettercnt[wordsin[i][0]-'a']++;

(I saw something similar in your scripting, but couldn't spot it in
your .c)

It's probably this line:

++nbig[(unsigned char)buffer[0]]

The cast is because 'char' is signed and could be negative.

Note that my arrays can have arbitrary lower bounds (this is a rare
feature among HLLS), and here start from 'a'.

2) sort the data just read in
qsort(wordsin, wordcnt, sizeof(char*), comparechar);

3) using the lettercnt[] array from step 1, determine the start-end positions of each set of words beginning with a..z

Letter�� Start�� End
a�� 0�� 20484
b�� 20485�� 34475
c�� 34476�� 60069
d�� 60070�� 75050
e�� 75051�� 86572
f�� 86573�� 95977
g�� 95978� 104985
h�� 104986� 116490
i�� 116491� 127653
j�� 127654� 129796
k�� 129797� 132749
l�� 132750� 140949
m�� 140950� 157658
n�� 157659� 166088
o�� 166089� 175859
p�� 175860� 205604
q�� 205605� 207078
r�� 207079� 221162
s�� 221163� 253678
t�� 253679� 269769
u�� 269770� 287936
v�� 287937� 292502
w�� 292503� 297884
x�� 297885� 298330
y�� 298331� 299249
z�� 299250� 300397

4) generate 100+ random numbers between start and end of each letter
int r = (rand() % (end - start + 1)) + start;

This 'calculation of start and end' for each letter is what I thought to
be a novel approach.

I'm curious how others will approach it (if anyone else tries).

I don't understand what's going on there. If there are N words in total
that start with 'c', say, then I just generate a random number from 0 to
N-1 (C), or 1 to N (M).

Altogether my program makes:
3 passes thru the 300398 words in:
� * 1 to count total words and words by letter
� * 1 to load the words into an array
� * 1 to find duplicates

2 passes thru the 2600 words out:
� * 1 to verify the 100 words per letter
� * 1 to print all 2600 words

5 total passes?� Not sure that's Ivy League.� But everything runs in
1/10th of a second so I can't complain.

In 0.25 seconds on my machine! This is why it can better to not use the fastest machine around, then you can spot inefficiencies more easily.

That was Windows; on WSL it was a little slower: 0.4 seconds 'real' time.

If you have a short challenge of medium difficulty, post it so we can
learn and improve skillz.

I tried the same program on an unsorted list 10 times the size. That is,
just duplicating everything to get a 3,003,980-line file.

Generally programs still worked, but took longer, and the list of
duplicates was a bit bigger!

The Python version took 4.2s or 5s on PyPy. My Q version got much slower
at 14s (maybe the interpreted sort is the reason).

Your C version was 4.5s. Mine are 3.x but they cap the duplicates at 100
so they can't be compared.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Richard Harnden@3:633/10 to All on Wed Mar 25 12:32:06 2026

On 22/03/2026 14:38, DFS wrote:

---------------------
Objective
---------------------
deliver a C (and optional 2nd language) program that - from a large list
of unsorted words possibly containing duplicates - extracts 26 sets of
100 random and unique words that each begin with a letter of the English alphabet.

---------------------
Outputs
---------------------

1) count of words by letter

Letter�� Words In�� Words Out
a�� 2345�� 100
b�� 4399�� 100
c�� 844�� 100
...
z�� 1011�� 100

2) identify duplicate words in the input file (if any) and print
�� them sorted and using proper case on one line.

�� found:� eventually dupes you get
�� output: Dupes Eventually Get You

3) print the 2600 words you identify in column x row order in a grid of
�� size (200rows x 13cols or 300x9 or 400x7 or 500x6 or 600x4 etc)
�� without hard-coding each column in a long printf.� They must be in
�� alpha order.� If you participated in the 'sort of trivial challenge'
�� a few weeks ago, you'll recognize this requirement.

2600 unique random words (1000 rows x 3 columns)
�� 1.� aardwolves�� kafirin�� uberous
�� 2.� abaze�� kafiz�� ulnae
�� 3.� abitibi�� kala�� ulnare
� ...
�599.� funned�� pyrone�� zymophosphate
�600.� fusan�� pythiacystis�� zymotic
�601.� gable�� qanat
�602.� gade�� qere
�� ...
�998.� juvia�� typedefs
�999.� juxtaposition�� tyrannizings
1000.� jynx�� tzaddikim

---------------------
Requirement
---------------------
You must call a RNG 2600+ times to build the list (ie you can't use the random ordering of the input file to your advantage).� In repeated runs,
my C and python solutions called the RNG 2635x to 2675x (because of duplicate randoms).

---------------------
Word Source
---------------------
There's a huge unsorted word list here:

https://limewire.com/?referrer=pq7i8xx7p2

...which you can develop against.

My C and python solutions are shown below, and at the same link.

No code perusal until you submit yours!

Enjoy!

========================================================================
C� 125 LOC
On my WSL system this C runs in 0.095 seconds using the unsorted words file ========================================================================

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <ctype.h>� //for tolower and toupper

//example usage = $./2600words words_unsorted.txt 500 6

//string compare function for qsort
int comparechar( const void *a, const void *b){
��const char **chara = (const char **)a;
�� const char **charb = (const char **)b;
�� return strcmp(*chara, *charb);
}

int main(int argc, char *argv[]) {

��//validations
��if ((argc) < 4) {
�� printf("Invalid input \nEnter program-name� word-file� rows columns\n");
�� printf("example: $./2600words words.txt 400� 7\n\n");
�� exit(0);
��}

��if (atoi(argv[2]) * atoi(argv[3]) < 2600) {
�� printf("Invalid input: enter rows * columns that total 2600+ \n\n");
�� exit(0);
��}

��int� i = 0, t = 0, wout = 0;�� // counters
��int� lettercnt[26] = {0};�� //hold count of words by first letter
��int� maxwordlen = 0;�� //
length of longest word in list
��int� start = 0, end = 0;�� //used
to extract 100 words per letter
��int� temp[100] = {0};�� //
holds the 100 random words for the letter
��int� wordcnt = 0, totwords = 0;�� //
used to extract 100 words per letter
��char line[35] = "";�� // buffer to hold line when reading file
��char therand[9];�� //the current random value
��char usedlist[1000];�� //
stores the random numbers already used

��// ===========================================================================
��//nitty gritty - read in the unsorted words
��// ===========================================================================
��FILE *fin = fopen(argv[1],"r");�� //
open file
��while (fgets(line,sizeof line, fin)!= NULL) {�� //
count lines = words, get max word length
�� wordcnt++;
�� if (strlen(line) > maxwordlen) {
�� maxwordlen = strlen(line);
�� }
��}
��char theword[maxwordlen+1];
��rewind(fin);�� //
pointer back to beginning
��char **wordsin� = malloc(sizeof(char*) * wordcnt);�� // allocate memory
��while (fgets(theword,sizeof theword, fin) != NULL) {�� //read
line into buffer
�� int wordlen = strlen(theword);�� //get length of word
�� wordsin[i] = malloc(wordlen + 1);�� // allocate memory for the word
�� strncpy(wordsin[i], theword, wordlen);�� //
copy word into array
�� wordsin[i][wordlen-1] = '\0';�� //add terminator - overwrites the \n in the file
�� lettercnt[wordsin[i][0]-'a']++;�� // update count of words by first letter
�� i++;�� // increment counter
��}
��fclose(fin);�� //close handle to file

��// ===========================================================================
��//fun begins
��// ===========================================================================
��//sort master list of words
��//for each letter, determine the start and end positions of words beginning with that letter
��//generate random numbers between that start and end
��//check if that random number is in the usedlist array.� If not,
add it to usedlist and temp arrays
��//when temp array has 100 unique randoms in it, add them to the
master array, break and go to next letter
��//do one sort at the end
��qsort(wordsin, wordcnt, sizeof(char*), comparechar);�� //sort
the master
��char **wordsout = malloc(sizeof(char*) * 2600);�� //
final output goes into this array
��srand(time(NULL));
��for (i = 0; i < 26; i++) {�� //
find start-end for each letter set
�� start = (totwords += lettercnt[i]) - lettercnt[i];
�� end = start + lettercnt[i] - 1;
�� memset(usedlist, 0, sizeof(usedlist));
�� memset(temp,�� 0, 100);
�� t = 0;
�� for (int j = 0; j < 200; j++) {
�� int r = (rand() % (end - start + 1)) + start;
�� sprintf(therand," %d ", r);
�� if (strstr(usedlist, therand) == NULL) {
�� strncat(usedlist, therand, strlen(therand));
�� temp[t++] = r;
�� if (t > 100) {
�� for (int j = 0; j < 100; j++) {
�� sprintf(theword, "%s", wordsin[temp[j]]);
�� int wordlen = strlen(theword);
�� wordsout[wout] = malloc(wordlen + 1);
�� strncpy(wordsout[wout], theword, wordlen);
�� wordsout[wout++][wordlen] = '\0';
�� }
�� break;
�� }
�� }
�� }
��}
��qsort(wordsout, wout, sizeof(char*), comparechar);�� //final sort
of 2600 words

��// ===================================================================================================
��//final output: print word counts by letter, print dupes, print
random words by column then row
��// ===================================================================================================
��printf("%d words loaded\n",wordcnt);
��if(wout == 2600) {
�� printf("list of 2600 unique random words created\n");
�� printf("\nLetter�� Words In�� Words Out\n");
�� t = 0;
�� for (i = 0; i < 26; i ++) {
�� t = 0;
�� for (int j = 0; j < wout; j++) {
�� if (wordsout[j][0] == (i + 97)) {t++;}
�� }
�� printf("� %2c�� %6d�� %d\n", i+97, lettercnt[i], t);
�� }

��} else {
�� printf("Errors occurred.� 2600 words not produced.\n");
�� exit(0);
��}

��//duplicate words
��printf("\nDuplicate words in proper case\n");
��for (i = 0; i < wordcnt-1; i++) {
�� if (strcmp(wordsin[i], wordsin[i+1]) == 0) {
�� sprintf(theword, "%s", wordsin[i]);
�� for (int i = 0; theword[i] != '\0'; i++) {
�� if (i == 0) {theword[i] = toupper(theword[i]);}
�� if (i >� 0) {theword[i] = tolower(theword[i]);}
�� }
�� printf("%s ", theword);
�� }
��}

��//print random words in column then row order
��int rows = atoi(argv[2]), cols = atoi(argv[3]), colwidth = 20;
��printf("\n\n2600 unique random words (%d rows x %d columns)\n",
rows, cols);
��for (int r = 1; r <= rows; r++) {
�� if (r <= wout) {
�� int nbr = r;
�� printf("%3d. %-*s", r, colwidth, wordsout[nbr-1]);
�� for (int i = 0; i < cols-1; i++) {
�� nbr += rows;
�� if (nbr <= wout) {
�� printf("%-*s", colwidth, wordsout[nbr-1]);
�� }
�� }
�� printf("\n");
�� }
��}

��//finito
��free(wordsin);
��free(wordsout);

��return 0;
}

========================================================================

======================================================================== python� 58 LOC
On my WSL system this python runs in 1.05 seconds using the unsorted
words file ========================================================================

import sys, random

if len(sys.argv) < 4:
��print("Invalid input \nEnter program name word-file rows columns")
��print("example: $ python3� 2600words.py� words.txt� 400� 7")
��exit()

if (int(sys.argv[2]) * int(sys.argv[3])) < 2600:
��print("Invalid input: enter rows * columns that total 2600+")
��exit()

#read unsorted words file, generate 100 randoms per letter
from collections import Counter
wordsout, used, temp = [],[],[]
lettercnt = [0]*26
with open(sys.argv[1],'r') as f:
��wordsin = f.readlines()
��for line in wordsin:
�� lettercnt[ord(line[0])-97] += 1
��print("%d words loaded" % (len(wordsin)))
��wordsuni = sorted(set(wordsin))
��for letter in 'abcdefghijklmnopqrstuvwxyz':
�� used.clear()
�� temp.clear()
�� lwords = [ line for line in wordsuni if line[0] == letter ]
�� lenwordset = len(lwords)
�� for i in range(200):
�� randword = lwords[random.randint(0, lenwordset-1)]
�� if randword not in used:
�� temp.append(randword.rstrip())
�� used.append(randword)
�� if len(temp) == 100:
�� wordsout += sorted(temp)
�� break
print('list of ' + str(len(wordsout)) + ' unique random words created')

#words out should always be 100 per letter
print("\nLetter�� Words In�� Words Out")
for i in range(26):
��wout = 0
��for j in range(len(wordsout)):
�� if (ord(wordsout[j][0]) == (i + 97)):
�� wout += 1
��print("� %2c�� %6d�� %d" % (i+97, lettercnt[i], wout))

#find duplicate words
print("\nDuplicate words in proper case")
counts = Counter(wordsin)
dupes� = [item for item, count in counts.items() if count > 1]
if len(dupes) > 0:
��for dupe in sorted(dupes):
�� print(dupe.strip().title(), end = ' ')
else:
��print("no duplicate words")

#print randoms by col then row
rows, cols = int(sys.argv[2]), int(sys.argv[3])
colwidth, words = 20, len(wordsout)
print("\n\n2600 unique random words (%d rows x %d columns)" % (rows, cols)) for r in range(1,rows + 1):
��if r <= words:
�� nbr = r
�� print("%3d. %-*s" % (nbr, colwidth, wordsout[nbr-1]), end = ' ')
�� for i in range(cols-1):
�� nbr += rows
�� if nbr <= words:
�� print("%-*s" % (colwidth, wordsout[nbr-1]), end = ' ')
�� print()

====================================================================

In shell ...

----
#!/bin/ksh

sort words_unsorted.txt >words.srt
uniq words.srt >words.unq

cat <<-X
There are $(wc -l words.srt |\
cut -d\ -f1) words in words_unsorted.txt
and $(wc -l words.unq | cut -d\ -f1) are unique.

Duplicates are: $(diff words.srt words.unq |\
grep "<" | cut -d\ -f2 | sed "s/^$.$/\u\1/g" |\
sort | tr '\n' ' ')

Counts:
$(
for X in {a..z}
do
echo -n "$X " ; grep -c ^$X words.unq
done
)

Samples ...
X

for X in {a..z}
do
grep ^$X words.unq | shuf -n 100 | sed "s/^$.$/\u\1/g"
done >words.tmp

head -1000 words.tmp >words.0
head -2000 words.tmp | tail -1000 >words.1
tail -600 words.tmp > words.2

paste words.0 words.1 words.2 | nl -w4 -s". "

rm words.srt words.unq words.0 words.1 words.2

return 0
----

eg:
There are 300398 words in words_unsorted.txt
and 300393 are unique.

Duplicates are: Congratulations On The Wherewithal Youngun

Counts:
a 20485
b 13991
c 25593
d 14981
[...]
w 5381
x 446
y 918
z 1148

Samples ...
1. Apterygote Kleptistic Untransmuted
2. Acetylcyanide Kekotene Uninfluencing
3. Adglutinate Kopek Unaugmented
[...]
598. Folkvangr Putrilaginously Zootic
599. Flebile Proboscides Zohak
600. Fugitivism Paralysis Zonaria
601. Gator Qualifications
602. Gongoristic Qualityless
[...]
998. Juttied Thankful
999. Jarry Trentine
1000. Jonglery Towlines

time says:
real 0m00.60s
user 0m00.89s
sys 0m00.15s

... which is fast enough for me.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Wed Mar 25 18:07:25 2026

On 25/03/2026 11:54, Bart wrote:

On 24/03/2026 17:06, DFS wrote:

On 3/23/2026 6:26 PM, Bart wrote:

1) count words by letter as the file is read in
lettercnt[wordsin[i][0]-'a']++;

(I saw something similar in your scripting, but couldn't spot it in
your .c)

It's probably this line:

�� ++nbig[(unsigned char)buffer[0]]

The cast is because 'char' is signed and could be negative.

Note that my arrays can have arbitrary lower bounds (this is a rare
feature among HLLS), and here start from 'a'.

I forgot this was from the C version. While my non-C has those arrays
with bounds from 'a' to 'z', when rewriting as C, I chose to have the
bounds from 0 to 'z' inclusive (so the dimension has to be 'z'+1).

This is a little wasteful, but that's only some 200 extra elements in
the program. The advantage is not have to apply offsets when indexing.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Wed Mar 25 23:21:26 2026

On 3/25/2026 8:32 AM, Richard Harnden wrote:

In shell ...

----
#!/bin/ksh

sort words_unsorted.txt >words.srt
uniq words.srt >words.unq

cat <<-X
�� There are $(wc -l words.srt |\
�� cut -d\� -f1) words in words_unsorted.txt
�� and $(wc -l words.unq | cut -d\� -f1) are unique.

�� Duplicates are: $(diff words.srt words.unq |\
�� grep "<" | cut -d\� -f2 | sed "s/^$.$/\u\1/g" |\
�� sort | tr '\n' ' ')

�� Counts:
�� $(
�� for X in {a..z}
�� do
�� echo -n "$X " ; grep -c ^$X words.unq
�� done
�� )

�� Samples ...
X

for X in {a..z}
do
�� grep ^$X words.unq | shuf -n 100 | sed "s/^$.$/\u\1/g"
done >words.tmp

head -1000 words.tmp >words.0
head -2000 words.tmp | tail -1000 >words.1
tail -600 words.tmp > words.2

paste words.0 words.1 words.2 | nl -w4 -s". "

rm words.srt words.unq words.0 words.1 words.2

return 0

29 lines. Speed is good. Very nice. Probably took you no more than an
hour to write.

How do I run it? I tried this in Windows Subsystem for Linux:

$ sudo ksh harnden.sh
: not found2]:
uniq: words.srt: No such file or directory
: not found5]:
: not found6]:
wc: words.srt: No such file or directory

Unfortunately...

* the output doesn't meet the requirements: the words are unique and
sorted, but they're not each randomly chosen by a RNG().

* you hard-coded your output logic: 3 groups of words in 3 columns.

* you didn't offer a C solution.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Wed Mar 25 23:38:39 2026

On 3/24/2026 8:29 PM, Tristan Wibberley wrote:

On 24/03/2026 17:17, DFS wrote:

On 3/24/2026 7:03 AM, Tristan Wibberley wrote:

On 23/03/2026 04:06, DFS wrote:

On 3/22/2026 8:05 PM, Tristan Wibberley wrote:

On 22/03/2026 23:14, DFS wrote:

On 3/22/2026 7:02 PM, Tristan Wibberley wrote:

On 22/03/2026 14:38, DFS wrote:

---------------------
Objective
---------------------
deliver a C (and optional 2nd language) program that - from a large >>>>>>>> list
of unsorted words possibly containing duplicates - extracts 26 >>>>>>>> sets of
100 random and unique words that each begin with a letter of the >>>>>>>> English
alphabet.

What random distribution, uniform?
Said distribution over the unique words or said distribution over the >>>>>>> original list?

pseudorandom?

I don't care about the uniformity of the distribution, as long as the >>>>>> output is unique words, and you generate and use 2600+ random values >>>>>> from a RNG.

I think you're unaware that I can predictably generate a sequence of >>>>> identical values when the distribution is free and your
specification is
satisfied by selecting with a distribution that prefers just one
indicatory value for a choice of word to the exclusion of all others. >>>>

Yeah, I don't really know what any of that means.� But it sounds like
your 3rd attempt to sidestep the generation and use of 2600+ random
values.

I think you could show some interesting techniques, but you have to
adhere to the requirements of the challenge.

It's because of my deeper understanding of the meaning (or barely
meaningfulness) of the word "random" and my awareness of how critical it >>> is to many applications of randomness.

That is:

� - If it's a game I could go ahead with a PRNG and satisfy you easily - >>> but it's not interesting to me these days, at this point I think it's a
game,

A game... now you're onto me.

� - If it's a secure application of choice that leaks /no/ information
about the input list beyond the fact of the achievement of the
lower-bound, respectively, on the number of words having each initial, I >>> can tighten your specification in some of the ways I've queried,

I sense your "tightening" will result in an unreadable spec, but it
would be fun to try.� So let's have it.

I don't think it would be unreadable. It might require some thoughtto synthesise a program that satisfies it.

But you've told me its just a game (I suppose "toy", rather than
gambling game). So the interesting bit could be just like:

"produce the output so that, within each of the groupings based on the initial letter, the words are superficially shuffled around even when
they're not shuffled around in the input."

So a sorted input list is loaded, then the groupings by letter are
shuffled "superficially".

What constitutes a superficial shuffle?

And what method do you propose to select 100 unique words from that superficially shuffled set of words?

>> I would agree that on a scale of necessary to unnecessary, this

challenge lies very close to unnecessary.

I didn't mean to suggest the challenge is unnecessary, but mean to
discuss the problems of writing specifications and requirements such
that they're not necessary to fulfil the goal. Requirements involving randomness and secrecy are particularly interesting in that respect.

But it lies closer to the middle of the scale interesting..uninteresting.

I have a few more up my sleeve.� One in particular I've been thinking
about, that explicitly disallows the use of a RNG.

More still, people get gambling games wrong and go to jail because they
learn and practice C programming without any awareness of the difficulty >>> of "random" and they might read this newsgroup to shape their skills.

You should consult an attorney - I wouldn't want you to do (rand() % 26)
+ 1 days in jail for reading clc and attempting my challenge.

You seem to be assuming every reader is just fulfilling a need for a passtime. I don't suppose that and you seem to be mocking me for it; I
think that's awful. You'd got me really excited about the breadth of
nuance in requirements and the effects of that and then turned it into
an opportunity for mockery.

So you were serious before now?

ps you're nuts

That may be, but how did you know?

Your first reply: "What random distribution, uniform?"

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Richard Harnden@3:633/10 to All on Thu Mar 26 04:02:04 2026

On 26/03/2026 03:21, DFS wrote:

On 3/25/2026 8:32 AM, Richard Harnden wrote:

In shell ...

----
#!/bin/ksh

sort words_unsorted.txt >words.srt
uniq words.srt >words.unq

cat <<-X
�� There are $(wc -l words.srt |\
�� cut -d\� -f1) words in words_unsorted.txt
�� and $(wc -l words.unq | cut -d\� -f1) are unique.

�� Duplicates are: $(diff words.srt words.unq |\
�� grep "<" | cut -d\� -f2 | sed "s/^$.$/\u\1/g" |\
�� sort | tr '\n' ' ')

�� Counts:
�� $(
�� for X in {a..z}
�� do
�� echo -n "$X " ; grep -c ^$X words.unq
�� done
�� )

�� Samples ...
X

for X in {a..z}
do
�� grep ^$X words.unq | shuf -n 100 | sed "s/^$.$/\u\1/g"
done >words.tmp

head -1000 words.tmp >words.0
head -2000 words.tmp | tail -1000 >words.1
tail -600 words.tmp > words.2

paste words.0 words.1 words.2 | nl -w4 -s". "

rm words.srt words.unq words.0 words.1 words.2

return 0

29 lines.� Speed is good.� Very nice.� Probably took you no more than an hour to write.

How do I run it?� I tried this in Windows Subsystem for Linux:

$ sudo ksh harnden.sh
: not found2]:
uniq: words.srt: No such file or directory
: not found5]:
: not found6]:
wc: words.srt: No such file or directory

Make sure that words_unsorted.txt is in the same directory,
that harnden.sh is executable,
then: ./harnden.sh

No need for sudo.

Unfortunately...

* the output doesn't meet the requirements: the words are unique and
� sorted, but they're not each randomly chosen by a RNG().

shuf(1) will call rand(3)

* you hard-coded your output logic: 3 groups of words in 3 columns.

True, but it satisfies your "2600 unique random words (1000 rows x 3
columns)"

* you didn't offer a C solution.

No, I wanted to see if a shell solution was 'good enough'.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Thu Mar 26 00:14:00 2026

On 3/26/2026 12:02 AM, Richard Harnden wrote:

On 26/03/2026 03:21, DFS wrote:

On 3/25/2026 8:32 AM, Richard Harnden wrote:

In shell ...

----
#!/bin/ksh

sort words_unsorted.txt >words.srt
uniq words.srt >words.unq

cat <<-X
�� There are $(wc -l words.srt |\
�� cut -d\� -f1) words in words_unsorted.txt
�� and $(wc -l words.unq | cut -d\� -f1) are unique.

�� Duplicates are: $(diff words.srt words.unq |\
�� grep "<" | cut -d\� -f2 | sed "s/^$.$/\u\1/g" |\
�� sort | tr '\n' ' ')

�� Counts:
�� $(
�� for X in {a..z}
�� do
�� echo -n "$X " ; grep -c ^$X words.unq
�� done
�� )

�� Samples ...
X

for X in {a..z}
do
�� grep ^$X words.unq | shuf -n 100 | sed "s/^$.$/\u\1/g"
done >words.tmp

head -1000 words.tmp >words.0
head -2000 words.tmp | tail -1000 >words.1
tail -600 words.tmp > words.2

paste words.0 words.1 words.2 | nl -w4 -s". "

rm words.srt words.unq words.0 words.1 words.2

return 0

29 lines.� Speed is good.� Very nice.� Probably took you no more than
an hour to write.

How do I run it?� I tried this in Windows Subsystem for Linux:

$ sudo ksh harnden.sh
: not found2]:
uniq: words.srt: No such file or directory
: not found5]:
: not found6]:
wc: words.srt: No such file or directory

Make sure that words_unsorted.txt is in the same directory,
that harnden.sh is executable,
then: ./harnden.sh

No need for sudo.

$ ksh ./harnden.sh: not foundh[2]:
: cannot create [Permission denied]
: cannot create [Permission denied]
: not foundh[5]:
wc: words.srt: No such file or directory
: not foundh[6]:

words_unsorted.txt is in the directory

After that runs:
words.srt is created (and the words are sorted)
words.unq is empty

Unfortunately...

* the output doesn't meet the requirements: the words are unique and
�� sorted, but they're not each randomly chosen by a RNG().

shuf(1) will call rand(3)

* you hard-coded your output logic: 3 groups of words in 3 columns.

True, but it satisfies your "2600 unique random words (1000 rows x 3 columns)"

* you didn't offer a C solution.

No, I wanted to see if a shell solution was 'good enough'.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Thu Mar 26 01:02:49 2026

On 3/26/2026 12:14 AM, DFS wrote:

On 3/26/2026 12:02 AM, Richard Harnden wrote:

On 26/03/2026 03:21, DFS wrote:

On 3/25/2026 8:32 AM, Richard Harnden wrote:

In shell ...

----
#!/bin/ksh

sort words_unsorted.txt >words.srt
uniq words.srt >words.unq

cat <<-X
�� There are $(wc -l words.srt |\
�� cut -d\� -f1) words in words_unsorted.txt
�� and $(wc -l words.unq | cut -d\� -f1) are unique.

�� Duplicates are: $(diff words.srt words.unq |\
�� grep "<" | cut -d\� -f2 | sed "s/^$.$/\u\1/g" |\
�� sort | tr '\n' ' ')

�� Counts:
�� $(
�� for X in {a..z}
�� do
�� echo -n "$X " ; grep -c ^$X words.unq
�� done
�� )

�� Samples ...
X

for X in {a..z}
do
�� grep ^$X words.unq | shuf -n 100 | sed "s/^$.$/\u\1/g"
done >words.tmp

head -1000 words.tmp >words.0
head -2000 words.tmp | tail -1000 >words.1
tail -600 words.tmp > words.2

paste words.0 words.1 words.2 | nl -w4 -s". "

rm words.srt words.unq words.0 words.1 words.2

return 0

29 lines.� Speed is good.� Very nice.� Probably took you no more than
an hour to write.

How do I run it?� I tried this in Windows Subsystem for Linux:

$ sudo ksh harnden.sh
: not found2]:
uniq: words.srt: No such file or directory
: not found5]:
: not found6]:
wc: words.srt: No such file or directory

Make sure that words_unsorted.txt is in the same directory,
that harnden.sh is executable,
then: ./harnden.sh

No need for sudo.

$ ksh ./harnden.sh: not foundh[2]:
: cannot create [Permission denied]
: cannot create [Permission denied]
: not foundh[5]:
wc: words.srt: No such file or directory
: not foundh[6]:

words_unsorted.txt is in the directory

After that runs:
�words.srt is created (and the words are sorted)
�words.unq is empty

I converted your script using dos2unix and it ran fine.

Very fast, too: 0.288s

updated the rm command to also remove words.tmp

updated the paste command for spacing.

paste words.0 words.1 words.2 |
nl -w4 -s". " |
awk '{printf "%d. %-20s %-20s %-20s\n", $1, $2, $3, $4}'

And got a nice output.

Next I'll figure out how to pass in a file name and row column values
from the command line, and print a variable sized grid of rows by columns.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tristan Wibberley@3:633/10 to All on Thu Mar 26 12:08:35 2026

On 26/03/2026 03:38, DFS wrote:

On 3/24/2026 8:29 PM, Tristan Wibberley wrote:

On 24/03/2026 17:17, DFS wrote:

On 3/24/2026 7:03 AM, Tristan Wibberley wrote:

On 23/03/2026 04:06, DFS wrote:

On 3/22/2026 8:05 PM, Tristan Wibberley wrote:

On 22/03/2026 23:14, DFS wrote:

On 3/22/2026 7:02 PM, Tristan Wibberley wrote:

On 22/03/2026 14:38, DFS wrote:

---------------------
Objective
---------------------
deliver a C (and optional 2nd language) program that - from a >>>>>>>>> large
list
of unsorted words possibly containing duplicates - extracts 26 >>>>>>>>> sets of
100 random and unique words that each begin with a letter of the >>>>>>>>> English
alphabet.

What random distribution, uniform?
Said distribution over the unique words or said distribution
over the
original list?

pseudorandom?

I don't care about the uniformity of the distribution, as long as >>>>>>> the
output is unique words, and you generate and use 2600+ random values >>>>>>> from a RNG.

I think you're unaware that I can predictably generate a sequence of >>>>>> identical values when the distribution is free and your
specification is
satisfied by selecting with a distribution that prefers just one
indicatory value for a choice of word to the exclusion of all others. >>>>>

Yeah, I don't really know what any of that means.� But it sounds like >>>>> your 3rd attempt to sidestep the generation and use of 2600+ random
values.

I think you could show some interesting techniques, but you have to
adhere to the requirements of the challenge.

It's because of my deeper understanding of the meaning (or barely
meaningfulness) of the word "random" and my awareness of how
critical it
is to many applications of randomness.

That is:

�� - If it's a game I could go ahead with a PRNG and satisfy you
easily -
but it's not interesting to me these days, at this point I think it's a >>>> game,

A game... now you're onto me.

�� - If it's a secure application of choice that leaks /no/ information >>>> about the input list beyond the fact of the achievement of the
lower-bound, respectively, on the number of words having each
initial, I
can tighten your specification in some of the ways I've queried,

I sense your "tightening" will result in an unreadable spec, but it
would be fun to try.� So let's have it.

I don't think it would be unreadable. It might require some thoughtto
synthesise a program that satisfies it.

But you've told me its just a game (I suppose "toy", rather than
gambling game). So the interesting bit could be just like:

"produce the output so that, within each of the groupings based on the
initial letter, the words are superficially shuffled around even when
they're not shuffled around in the input."

So a sorted input list is loaded, then the groupings by letter are
shuffled "superficially".

What constitutes a superficial shuffle?

I don't know, it's you that said the random distribution wasn't important.
.

--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Thu Mar 26 08:34:50 2026

On 3/26/2026 8:08 AM, Tristan Wibberley wrote:

What constitutes a superficial shuffle?

I don't know, it's you that said the random distribution wasn't important.

.

More nutty vibes.

You're excused to consult with your doctor or lawyer or both.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Thu Mar 26 09:41:58 2026

On 3/25/2026 7:54 AM, Bart wrote:

On 24/03/2026 17:06, DFS wrote:

On 3/23/2026 6:26 PM, Bart wrote:

It looks very clunky but seems to do the job, and not too slowly
either (see below).

Years ago I was shocked how fast C chewed thru text data (and it's
even faster dealing with numbers).

Actually, I'm still shocked.� I wrote an anagram program in C that
used prime factors to do searches, and it found 5 anagrams from a list
of 370K words in 0.0055s (5.5/1000ths of a second).

You're attributing too much to C. Or maybe comparing it too much to
Python which is very slow.

And VB/A.

There are other factors: hardware today is incredibly fast (like 4 magnitudes or more faster than the 8-bit machines I started off on).

And a lot of it is due to the optimising compilers now available.

My own systems language is aso quite low-level. And can be just as fast
if someone were to write an optimising compiler for it too!

(As it is, it's not far off. Its self-hosted compiler can build over 20
new generations of itself per second, on a machine slower than yours.)

And it would be even faster with the use of a hash table.� Incredible.

And that's on my low-end AMD Ryzen 5600G (16GB DDR4-3200 RAM)

If that's low-end, what would be high-end? I mean in desktop computer
terms not some supercomputer.

Highest end PCs
--------------------------------------------
These CPUs
https://www.cpubenchmark.net/single-thread/ https://www.cpubenchmark.net/multithread/

paired with

these DDR5 RAMs
https://www.memorybenchmark.net/

paired with

these 5th gen PCIe NVMe drives
https://www.harddrivebenchmark.net/drives/

paired with

these high-end, unbelievably expensive video cards https://www.videocardbenchmark.net/high_end_gpus.html

You should be set forever.

I haven't priced out a full system in a long time, and RAM prices have
surged the last 6 months, but you can probably still get a smokin' fast
tower computer with a low-end-but-plenty-fast-enough video card for
$2500 to $3000.

Research, order and build it yourself to save $500+.

No extra copy of the list is necessary to find duplicates (but for

one- > pass efficiency, sorting the list is required).

Look at the first letter of each duplicate.

"congratulations on the wherewithal youngun"
cotwy

Sort the file and the dupes are already sorted.� That was intentional.

If that explanation lets you drop some lines, good.

I'm now down to 150 sloc for the C version, and 125 sloc for the M version.

I'm at 146 (but if the dupes were unsorted would need a few more)

My method of finding the 26 sets was to:

1) count words by letter as the file is read in
lettercnt[wordsin[i][0]-'a']++;

(I saw something similar in your scripting, but couldn't spot it in
your .c)

It's probably this line:

�� ++nbig[(unsigned char)buffer[0]]

The cast is because 'char' is signed and could be negative.

Note that my arrays can have arbitrary lower bounds (this is a rare
feature among HLLS),

Sounds dangerous.

and here start from 'a'.

2) sort the data just read in
qsort(wordsin, wordcnt, sizeof(char*), comparechar);

3) using the lettercnt[] array from step 1, determine the start-end
positions of each set of words beginning with a..z

Letter�� Start�� End
a�� 0�� 20484
b�� 20485�� 34475
c�� 34476�� 60069
d�� 60070�� 75050
e�� 75051�� 86572
f�� 86573�� 95977
g�� 95978� 104985
h�� 104986� 116490
i�� 116491� 127653
j�� 127654� 129796
k�� 129797� 132749
l�� 132750� 140949
m�� 140950� 157658
n�� 157659� 166088
o�� 166089� 175859
p�� 175860� 205604
q�� 205605� 207078
r�� 207079� 221162
s�� 221163� 253678
t�� 253679� 269769
u�� 269770� 287936
v�� 287937� 292502
w�� 292503� 297884
x�� 297885� 298330
y�� 298331� 299249
z�� 299250� 300397

4) generate 100+ random numbers between start and end of each letter
int r = (rand() % (end - start + 1)) + start;

This 'calculation of start and end' for each letter is what I thought
to be a novel approach.

I'm curious how others will approach it (if anyone else tries).

I don't understand what's going on there.

sorted array
--------------------------------------
Position in Array
Letter WordCnt Start End
--------------------------------------
a 20485 0 20484
b 13991 20485 34475
c 25594 34476 60069
d 14981 60070 75050
e 11522 75051 86572
f 9405 86573 95977
g 9008 95978 104985
h 11505 104986 116490
i 11163 116491 127653
j 2143 127654 129796
k 2953 129797 132749
l 8200 132750 140949
m 16709 140950 157658
n 8430 157659 166088
o 9771 166089 175859
p 29745 175860 205604
q 1474 205605 207078
r 14084 207079 221162
s 32516 221163 253678
t 16091 253679 269769
u 18167 269770 287936
v 4566 287937 292502
w 5382 292503 297884
x 446 297885 298330
y 919 298331 299249
z 1148 299250 300397
--------------------------------------

So the start-end values become the range of randoms generated for that
letter.
int r = (rand() % (end - start + 1)) + start;

If there are N words in total
that start with 'c', say, then I just generate a random number from 0 to
N-1 (C), or 1 to N (M).

How do you address a word at position 99999 in the sorted list by using
0 or 1?

Altogether my program makes:
3 passes thru the 300398 words in:
�� * 1 to count total words and words by letter
�� * 1 to load the words into an array
�� * 1 to find duplicates

2 passes thru the 2600 words out:
�� * 1 to verify the 100 words per letter
�� * 1 to print all 2600 words

5 total passes?� Not sure that's Ivy League.� But everything runs in
1/10th of a second so I can't complain.

In 0.25 seconds on my machine! This is why it can better to not use the fastest machine around, then you can spot inefficiencies more easily.

I just added internal timing code to the C program:

1) loaded 300398 words in 0.028 seconds
2) created 26 sets of 100 unique words in 0.067 seconds
3) printed counts of words by letter in 0.000 seconds
4) identified and printed duplicate words in 0.003 seconds
5) printed 2600 words in 0.002 seconds
6) total run time is 0.101 seconds

Total run time isn't a sum of those operations times. It's from a
master timer that starts at the very beginning and ends after 1-5 are completed. So it *should* match, and it probably does if the values are carried out to 4 or 5 decimals.

That was Windows; on WSL it was a little slower: 0.4 seconds 'real' time.

If you have a short challenge of medium difficulty, post it so we can
learn and improve skillz.

I tried the same program on an unsorted list 10 times the size. That is, just duplicating everything to get a 3,003,980-line file.

Generally programs still worked, but took longer, and the list of
duplicates was a bit bigger!

The Python version took 4.2s or 5s on PyPy. My Q version got much slower
at 14s (maybe the interpreted sort is the reason).

I added more dupes and 10-timesd the file to 3004320 words and got 2.5s
with C, and 2.1s with Python (it's extremely rare that Python runs
faster than C)

With no print to screen it was 0.65 C, and 1.9s Python

Your C version was 4.5s. Mine are 3.x but they cap the duplicates at
100 so they can't be compared.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Thu Mar 26 15:32:08 2026

DFS <nospam@dfs.com> writes:

On 3/25/2026 8:32 AM, Richard Harnden wrote:

In shell ...

----
#!/bin/ksh

sort words_unsorted.txt >words.srt
uniq words.srt >words.unq

cat <<-X
�� There are $(wc -l words.srt |\
�� cut -d\� -f1) words in words_unsorted.txt
�� and $(wc -l words.unq | cut -d\� -f1) are unique.

�� Duplicates are: $(diff words.srt words.unq |\
�� grep "<" | cut -d\� -f2 | sed "s/^$.$/\u\1/g" |\
�� sort | tr '\n' ' ')

�� Counts:
�� $(
�� for X in {a..z}
�� do
�� echo -n "$X " ; grep -c ^$X words.unq
�� done
�� )

�� Samples ...
X

for X in {a..z}
do
�� grep ^$X words.unq | shuf -n 100 | sed "s/^$.$/\u\1/g"
done >words.tmp

head -1000 words.tmp >words.0
head -2000 words.tmp | tail -1000 >words.1
tail -600 words.tmp > words.2

paste words.0 words.1 words.2 | nl -w4 -s". "

rm words.srt words.unq words.0 words.1 words.2

return 0

29 lines. Speed is good. Very nice. Probably took you no more than an >hour to write.

How do I run it? I tried this in Windows Subsystem for Linux:

$ sudo ksh harnden.sh
: not found2]:
uniq: words.srt: No such file or directory
: not found5]:
: not found6]:
wc: words.srt: No such file or directory

Unfortunately...

* the output doesn't meet the requirements: the words are unique and
sorted, but they're not each randomly chosen by a RNG().

So pipe the wordlist through 'shuf' each time you select.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Thu Mar 26 11:35:28 2026

On 3/25/2026 8:32 AM, Richard Harnden wrote:

----
#!/bin/ksh

sort words_unsorted.txt >words.srt
uniq words.srt >words.unq

cat <<-X
�� There are $(wc -l words.srt |\
�� cut -d\� -f1) words in words_unsorted.txt
�� and $(wc -l words.unq | cut -d\� -f1) are unique.

�� Duplicates are: $(diff words.srt words.unq |\
�� grep "<" | cut -d\� -f2 | sed "s/^$.$/\u\1/g" |\
�� sort | tr '\n' ' ')

�� Counts:
�� $(
�� for X in {a..z}
�� do
�� echo -n "$X " ; grep -c ^$X words.unq
�� done
�� )

�� Samples ...
X

for X in {a..z}
do
�� grep ^$X words.unq | shuf -n 100 | sed "s/^$.$/\u\1/g"
done >words.tmp

head -1000 words.tmp >words.0
head -2000 words.tmp | tail -1000 >words.1
tail -600 words.tmp > words.2

paste words.0 words.1 words.2 | nl -w4 -s". "

rm words.srt words.unq words.0 words.1 words.2

return 0
----

As a learning exercise I rewrote yours as much as I could. It now takes
row and column inputs, and prints column by row. And I changed the
words output from proper case to lower case.

It's consistently a tad slower in bash
bash: 0.31s
ksh : 0.29s

example usage: $ time bash ./dfs.sh words_unsorted.txt 700 4 ===================================================================== #!/bin/bash
unsorted="$1"; rows="$2"; cols="$3"
sort "$unsorted" | tee words.sort | uniq > words.uniq
w=$(wc -l < "words.sort"); echo "$w words in $unsorted"
w=$(wc -l < "words.uniq"); echo -e "$w unique words\n"
printf "Duplicates: "
comm -23 words.sort words.uniq | sed 's/\b$.$/\u\1/g' | tr '\n' ' '
echo -e "\n\nWord Counts\n By Letter\n-----------"
for x in {a..z}
do
echo -n "$x " ; grep -c ^$x words.uniq
done
echo -e "-----------\n"
for x in {a..z}
do
grep ^$x words.uniq | shuf -n 100 | sed "s/^$.$/\L&\1/g"
done > words.temp
sort words.temp > words.sort
words=($(cat "words.sort"))
if ((1==1)); then
for ((r = 1; r <= $rows; r++)); do
if ((r <= 2600)); then
nbr=$((r))
printf "%3d. %-25s" "$nbr" "${words[nbr-1]}"
for ((i = 0; i < $cols-1; i++)); do
nbr=$(($rows + nbr))
if ((nbr <= 2600)); then
printf "%-25s" "${words[nbr-1]}"
fi
done
printf "\n"
fi
done
fi
rm words.sort words.temp words.uniq
printf "\n" =====================================================================

I also have the commented version with command line validations if you
care. (2 numeric inputs for rows and columns, and rows * cols >= 2600)

I did something wrong - the words output has 2 first letters. Can you
spot where I messed up?

Thanks

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tristan Wibberley@3:633/10 to All on Thu Mar 26 15:37:11 2026

On 25/03/2026 11:54, Bart wrote:

it can better to not use the fastest machine around, then you can spot inefficiencies more easily.

The computer hardware market looks to me to be about to explode into so
many variations that efficiency measurements will be transferrable only
across a very small family of machines and low-level languages like C
will soon bear little resemblance to hardware.

The IBM, Knuth, and von Neumann ages are over.

--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Thu Mar 26 16:37:03 2026

On 26/03/2026 13:41, DFS wrote:

On 3/25/2026 7:54 AM, Bart wrote:

You should be set forever.

I haven't priced out a full system in a long time, and RAM prices have surged the last 6 months, but you can probably still get a smokin' fast tower computer with a low-end-but-plenty-fast-enough video card for
$2500 to $3000.

Research, order and build it yourself to save $500+.

Actually, for the stuff I do, I don't need anything that fast.

No extra copy of the list is necessary to find duplicates (but for

one- > pass efficiency, sorting the list is required).

Look at the first letter of each duplicate.

"congratulations on the wherewithal youngun"
cotwy

Sort the file and the dupes are already sorted.� That was intentional.

If that explanation lets you drop some lines, good.

I'm now down to 150 sloc for the C version, and 125 sloc for the M
version.

I'm at 146 (but if the dupes were unsorted would need a few more)

Slightly puzzled as the comment in your first C version said it was 125
lines. But if I put that into github, it says it's 164 sloc.

Note that my arrays can have arbitrary lower bounds (this is a rare
feature among HLLS),

Sounds dangerous.

It can be safer than C. If my 'char' was signed like C's can be, and I
wanted to create a histogram from some string, then I can just do this:

[-128..127]int hist # [char.bounds] would be used in practice
++hist[c]

In C, you'd get a bounds error if 'c' was not within the ASCII range.
You'd have to either apply offsets or cast to unsigned, but it is
something you have to remember to do.

More typically, if you're porting code from some 1-based algorithm, you
can have off-by-one errors (and UB), if you don't fully account for it.
It's great to have that choice in the language. (In my projects, about
1/3 of arrays are 0-based; most are 1-based.)

I don't understand what's going on there.

sorted array
--------------------------------------
�� Position in Array
Letter� WordCnt�� Start�� End
--------------------------------------
a�� 20485�� 0�� 20484
b�� 13991�� 20485�� 34475
c�� 25594�� 34476�� 60069
d�� 14981�� 60070�� 75050
e�� 11522�� 75051�� 86572
f�� 9405�� 86573�� 95977
g�� 9008�� 95978�� 104985
h�� 11505�� 104986�� 116490
i�� 11163�� 116491�� 127653
j�� 2143�� 127654�� 129796
k�� 2953�� 129797�� 132749
l�� 8200�� 132750�� 140949
m�� 16709�� 140950�� 157658
n�� 8430�� 157659�� 166088
o�� 9771�� 166089�� 175859
p�� 29745�� 175860�� 205604
q�� 1474�� 205605�� 207078
r�� 14084�� 207079�� 221162
s�� 32516�� 221163�� 253678
t�� 16091�� 253679�� 269769
u�� 18167�� 269770�� 287936
v�� 4566�� 287937�� 292502
w�� 5382�� 292503�� 297884
x�� 446�� 297885�� 298330
y�� 919�� 298331�� 299249
z�� 1148�� 299250�� 300397
--------------------------------------

So the start-end values become the range of randoms generated for that letter.
int r = (rand() % (end - start + 1)) + start;

I see, so you don't just have a char*[] array for each letter, you store start/end indices in the master set of words, which also gives you the word-count.

If there are N words in total that start with 'c', say, then I just
generate a random number from 0 to N-1 (C), or 1 to N (M).

How do you address a word at position 99999 in the sorted list by using
0 or 1?

There are 26 lists each containing all the words starting with that
letter. Each is of a different size (eg. 20485 for 'a').

The word lists are 0-based in C, and 1-Based in N, so that I need a
random number from 0 to 20484 in C, and 1..20485 in M, to select one of
those 20485 words.

Altogether my program makes:
3 passes thru the 300398 words in:
�� * 1 to count total words and words by letter
�� * 1 to load the words into an array
�� * 1 to find duplicates

2 passes thru the 2600 words out:
�� * 1 to verify the 100 words per letter
�� * 1 to print all 2600 words

5 total passes?� Not sure that's Ivy League.� But everything runs in
1/10th of a second so I can't complain.

In 0.25 seconds on my machine! This is why it can better to not use
the fastest machine around, then you can spot inefficiencies more easily.

I just added internal timing code to the C program:

1) loaded 300398 words in�� 0.028 seconds
2) created 26 sets of 100 unique words in�� 0.067 seconds
3) printed counts of words by letter in�� 0.000 seconds
4) identified and printed duplicate words in 0.003 seconds
5) printed 2600 words in�� 0.002 seconds
6) total run time is�� 0.101 seconds

So most of your runtime is in getting those 2600 words? That's very
suprising. In mine that takes the least time:

* Read words, copy into 26 sets 123 ms
* Get 26 random sets of 100 2 ms
* Challenge 1 (display sizes) 0 ms
* Challenge 2 (duplicates) 147 ms
* Challenge 3 (display the 2600) 14 ms

Challenge 2 involves sorting the 300K words.

Note that my method involves reading the file twice, using fgets. In a
real program I would read the entire file into memory, and do any
subsequent processing there.

Your C version was 4.5s. Mine are 3.x but they cap the duplicates at
100 so they can't be compared.

(I removed that cap, after realising dupls were already sorted, and
runtime was then about the same as your C version.)

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Richard Harnden@3:633/10 to All on Thu Mar 26 16:40:58 2026

On 26/03/2026 15:35, DFS wrote:

I did something wrong - the words output has 2 first letters.� Can you
spot where I messed up?

for x in {a..z}
do
grep ^$x words.uniq | shuf -n 100 | sed "s/^$.$/\L&\1/g"
done > words.temp

You don't need the '| sed "s/^$.$/\L&\1/g"' bit.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Thu Mar 26 19:59:57 2026

Michael S <already5chosen@yahoo.com> writes:

On Sun, 22 Mar 2026 23:21:43 +0000
Tristan Wibberley <tristan.wibberley+netnews2@alumni.manchester.ac.uk>
wrote:

On 22/03/2026 14:38, DFS wrote:

You must call a RNG 2600+ times to build the list

ie you can't use the
random ordering of the input file to your advantage).

The two are not the same, that is, the use of "ie" is wrong.

Which do you really require, or do you really require I satisfy the
conjunction of the two?

Do you try to hint that challenges with seemingly arbitrary rules and seemingly arbitrary purposes are not very worthy?
If yes, then you could as well say it directly.

I read Tristan's comment as asking for a clarification rather than
offering any judgments.

Personally, I think the proposed challenge has some interesting
parts. Unfortunately, other parts are dumb or pointless or
needlessly tedious, which is disappointing.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Fri Mar 27 12:16:32 2026

On 3/26/2026 10:59 PM, Tim Rentsch wrote:

Personally, I think the proposed challenge has some interesting
parts. Unfortunately, other parts are dumb or pointless or
needlessly tedious, which is disappointing.

Give us a small Rentsch challenge that won't explode our brains.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Richard Harnden@3:633/10 to All on Fri Mar 27 17:24:15 2026

On 25/03/2026 12:32, Richard Harnden wrote:

On 22/03/2026 14:38, DFS wrote:

---------------------
Objective
---------------------
deliver a C (and optional 2nd language) program that - from a large
list of unsorted words possibly containing duplicates - extracts 26
sets of 100 random and unique words that each begin with a letter of
the English alphabet.

Here's my C attempt.

146 lines, but I like my vertical whitespace.

-----
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#define COLS 3
#define MAX 32

static int qsort_strcmp(const void *p1, const void *p2)
{
return strcmp(*(const char **) p1, *(const char **) p2);
}

int main(void)
{
FILE *in = fopen("./words_unsorted.txt", "r");
char s[MAX];
int n;
int l = 0;
int total = 0;
int count[26] = {0};
int pos[26] = {0};
char **words[26] = {0};
char *out[2600] = {0};
int d = 0;
int o = 1 + 2600 / COLS;

while ( fgets(s, 32, in) != NULL )
{
if ( (n = strlen(s)) > l ) l = n;

total++;
count[s[0]-'a']++;
};

rewind(in);

printf("Total words: %d\n\n", total);

for (int i=0; i<26; i++)
words[i] = malloc(sizeof *words * count[i]);

while ( fgets(s, 32, in) != NULL )
{
int a;

n = strlen(s) - 1;
s[n] = '\0';

s[0] &= 0xdf;

a = s[0] - 'A';

words[a][pos[a]] = malloc(n+1);;

strcpy(words[a][pos[a]], s);

pos[a]++;
}

fclose(in);

for (int i=0; i<26; i++)
qsort(words[i], count[i], sizeof *words, qsort_strcmp);

for (int i=0; i<26; i++)
{
char *prior = NULL;

for (int j=0; j<count[i]; j++)
{
if ( prior == NULL ){
prior = words[i][j];
continue;
}

if ( strcmp(prior, words[i][j]) == 0 )
{
free(words[i][j]);
words[i][j] = NULL;

out[d] = prior;
d++;
}

prior = words[i][j];
}
}

printf("There are %d duplicates:", d);
for (int i=0; i<d; i++)
printf(" %s", out[i]);
printf("\n\n");

srand(time(NULL));

for (int i=0; i<26; i++)
{
printf("Selecting 100 words out of %d from set '%c'\n",
count[i], i+'A');

int j = 0;

while ( j < 100 )
{
int r = (rand() / (double) RAND_MAX) * count[i];

if ( words[i][r] == NULL ) continue;

out[i*100 + j] = words[i][r];
words[i][r] = NULL;

j++;
}
}

printf("\n");

qsort(out, 2600, sizeof *out, qsort_strcmp);

for (int i=0; i<o ; i++)
{
printf("%4d: ", i + 1);

for (int j=0; j<COLS; j++)
{
if ( i +j*o < 2600)
printf("%-*s", l, out[i + j*o]);
}

printf("\n");
}

for (int i=0; i<26; i++)
{
for (int j=0; j<count[i]; j++)
free(words[i][j]);

free(words[i]);
}

for (int i=0; i<2600; i++)
free(out[i]);

return 0;
}
-----

Total words: 300398

There are 5 duplicates: Congratulations On The Wherewithal Youngun

Selecting 100 words out of 20485 from set 'A'
Selecting 100 words out of 13991 from set 'B'
[...]
Selecting 100 words out of 919 from set 'Y'
Selecting 100 words out of 1148 from set 'Z'

1: Abastardize Intercurrence
Reinstitution
2: Abetter Interepidemic
Reiterating
[...]
866: Intentiveness Rehonour Zythum

867: Interassociation Reinforcing

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Sat Mar 28 00:52:24 2026

On 3/27/2026 1:24 PM, Richard Harnden wrote:

On 25/03/2026 12:32, Richard Harnden wrote:

On 22/03/2026 14:38, DFS wrote:

---------------------
Objective
---------------------
deliver a C (and optional 2nd language) program that - from a large
list of unsorted words possibly containing duplicates - extracts 26
sets of 100 random and unique words that each begin with a letter of
the English alphabet.

Here's my C attempt.

146 lines, but I like my vertical whitespace.

Thanks for the submission.

It's 106 lines of code, so it's the shortest yet.

The only part you didn't get quite right was:

"print the 2600 words you identify in column x row order in a grid of
size (200rows x 13cols or 300x9 or 400x7 or 500x6 or 600x4 etc) "

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Sat Mar 28 06:59:14 2026

Subject: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

On 2026-03-28 05:52, DFS wrote:

On 3/27/2026 1:24 PM, Richard Harnden wrote:

On 25/03/2026 12:32, Richard Harnden wrote:

On 22/03/2026 14:38, DFS wrote:

---------------------
Objective
---------------------
deliver a C (and optional 2nd language) program that - from a large
list of unsorted words possibly containing duplicates - extracts 26
sets of 100 random and unique words that each begin with a letter of
the English alphabet.

Here's my C attempt.

146 lines, but I like my vertical whitespace.

Thanks for the submission.

It's 106 lines of code, so it's the shortest yet.

The only part you didn't get quite right was:

"print the 2600 words you identify in column x row order in a grid of
�size (200rows x 13cols or 300x9 or 400x7 or 500x6 or 600x4 etc) "

Ruminations on the recent "C" challenges...

Some requirements appear to be quite arbitrary. But okay. When
I read about the tasks to implement the first thought that came
up was to use an appropriate language or tool-set, one that fits
better for the task, tasks that at least I consider annoying to
implement them in "C" because that language doesn't support it
well, because of C's primitivity (its low-level'ness). But okay;
we're in a C-group and the residents need feeding. - Why is it
that I consider it annoying in "C"? - Because I'd have liked to
implement such tasks based on existing _building blocks_; like
associative arrays, sensible array data types, and what not.
Instead of constructing and building a car with tools like an
axe and a stone, wouldn't it be a more sensible to create useful
tools in "C" to make such challenges concentrate more on the
actual problem than on how to reinvent the simplest tasks again
and again? - I'd certainly consider it worthwhile to challenge
implementations of building blocks that alleviate C-programmers
from all the boring error-prone and low-level tasks that are
celebrated ad nauseam. - The question I'd ask myself if faced
with (arbitrary or useful) requirements would be what elementary
functions I'd need to construct the solution. Such identified
and isolated features, i.e. their implementation, would have a
persistent value for more than a single arbitrary "C" challenge.

As said; just my upcoming ruminations about that. So, YMMV.
(Disclaimer: angering folks with other mindsets not intended,
but I'd also not be surprised if it's considered offensive.)

Janis

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Sat Mar 28 09:05:28 2026

Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

On 3/28/2026 1:59 AM, Janis Papanagnou wrote:

On 2026-03-28 05:52, DFS wrote:

On 3/27/2026 1:24 PM, Richard Harnden wrote:

On 25/03/2026 12:32, Richard Harnden wrote:

On 22/03/2026 14:38, DFS wrote:

---------------------
Objective
---------------------
deliver a C (and optional 2nd language) program that - from a large >>>>> list of unsorted words possibly containing duplicates - extracts 26 >>>>> sets of 100 random and unique words that each begin with a letter
of the English alphabet.

Here's my C attempt.

146 lines, but I like my vertical whitespace.

Thanks for the submission.

It's 106 lines of code, so it's the shortest yet.

The only part you didn't get quite right was:

"print the 2600 words you identify in column x row order in a grid of
��size (200rows x 13cols or 300x9 or 400x7 or 500x6 or 600x4 etc) "

Ruminations on the recent "C" challenges...

Some requirements appear to be quite arbitrary.

Every challenge/competition/sport is arbitrary: the rules, the
constraints, the scoring, the technology allowed, the type of ball used,
the size of the field of play, the surface, the measures of performance,
the judging, the number of participants, etc.

It's all made up on a whim.

But okay. When
I read about the tasks to implement the first thought that came
up was to use an appropriate language or tool-set, one that fits
better for the task, tasks that at least I consider annoying to
implement them in "C" because that language doesn't support it
well, because of C's primitivity (its low-level'ness). But okay;
we're in a C-group and the residents need feeding. - Why is it
that I consider it annoying in "C"? - Because I'd have liked to
implement such tasks based on existing _building blocks_; like
associative arrays, sensible array data types, and what not.
Instead of constructing and building a car with tools like an
axe and a stone, wouldn't it be a more sensible to create useful
tools in "C" to make such challenges concentrate more on the
actual problem than on how to reinvent the simplest tasks again
and again? - I'd certainly consider it worthwhile to challenge implementations of building blocks that alleviate C-programmers
from all the boring error-prone and low-level tasks that are
celebrated ad nauseam. - The question I'd ask myself if faced
with (arbitrary or useful) requirements would be what elementary
functions I'd need to construct the solution. Such identified
and isolated features, i.e. their implementation, would have a
persistent value for more than a single arbitrary "C" challenge.

I think one word would've sufficed where you used 235: python

I just like to see how the good programmers of clc use C to approach
different tasks. It's always educational to read clc.

One "implementation of persistent value" you might've enjoyed if you participated is how to print data by column then row (it was in both of
my recent challenges). There's a powerful Linux command (column) that
does it, but it gives you no control over the number of rows, and no
line numbering option.

Combine the amt of data you have to print, the rows x columns you want
to use, and a simple function to know how wide your terminal window is,
and you have a lot of control over the presentation of data.

int gettermcols() {
struct winsize w = {0};
ioctl(STDOUT_FILENO, TIOCGWINSZ, &w);
return w.ws_col;
}

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Sat Mar 28 09:20:39 2026

On 3/26/2026 10:59 PM, Tim Rentsch wrote:

Personally, I think the proposed challenge has some interesting
parts. Unfortunately, other parts are dumb or pointless or
needlessly tedious, which is disappointing.

Sounds like human life.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Mon Mar 30 11:26:07 2026

Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

On 2026-03-28 14:05, DFS wrote:

On 3/28/2026 1:59 AM, Janis Papanagnou wrote:
[...]

But okay. When
I read about the tasks to implement the first thought that came
up was to use an appropriate language or tool-set, one that fits
better for the task, tasks that at least I consider annoying to
implement them in "C" because that language doesn't support it
well, because of C's primitivity (its low-level'ness). But okay;
we're in a C-group and the residents need feeding. - Why is it
that I consider it annoying in "C"? - Because I'd have liked to
implement such tasks based on existing _building blocks_; like
associative arrays, sensible array data types, and what not.
Instead of constructing and building a car with tools like an
axe and a stone, wouldn't it be a more sensible to create useful
tools in "C" to make such challenges concentrate more on the
actual problem than on how to reinvent the simplest tasks again
and again? - I'd certainly consider it worthwhile to challenge
implementations of building blocks that alleviate C-programmers
from all the boring error-prone and low-level tasks that are
celebrated ad nauseam. - The question I'd ask myself if faced
with (arbitrary or useful) requirements would be what elementary
functions I'd need to construct the solution. Such identified
and isolated features, i.e. their implementation, would have a
persistent value for more than a single arbitrary "C" challenge.

I think one word would've sufficed where you used 235: python

Sorry, I cannot associate that statement with anything I said. -
What is that "235: python" referring to? - Mind to elaborate?

Janis

[...]

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Mon Mar 30 12:10:46 2026

Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

On 30/03/2026 10:26, Janis Papanagnou wrote:

On 2026-03-28 14:05, DFS wrote:

On 3/28/2026 1:59 AM, Janis Papanagnou wrote:
[...]

But okay. When
I read about the tasks to implement the first thought that came
up was to use an appropriate language or tool-set, one that fits
better for the task, tasks that at least I consider annoying to
implement them in "C" because that language doesn't support it
well, because of C's primitivity (its low-level'ness). But okay;
we're in a C-group and the residents need feeding. - Why is it
that I consider it annoying in "C"? - Because I'd have liked to
implement such tasks based on existing _building blocks_; like
associative arrays, sensible array data types, and what not.
Instead of constructing and building a car with tools like an
axe and a stone, wouldn't it be a more sensible to create useful
tools in "C" to make such challenges concentrate more on the
actual problem than on how to reinvent the simplest tasks again
and again? - I'd certainly consider it worthwhile to challenge
implementations of building blocks that alleviate C-programmers
from all the boring error-prone and low-level tasks that are
celebrated ad nauseam. - The question I'd ask myself if faced
with (arbitrary or useful) requirements would be what elementary
functions I'd need to construct the solution. Such identified
and isolated features, i.e. their implementation, would have a
persistent value for more than a single arbitrary "C" challenge.

I think one word would've sufficed where you used 235: python

Sorry, I cannot associate that statement with anything I said. -
What is that "235: python" referring to? - Mind to elaborate?

The 235 refers to the number of words in your paragraph (I haven't checked).

And 'python' is a summary of what they think you're trying to say. That language already has those ready-made building blocks.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From James Kuyper@3:633/10 to All on Mon Mar 30 20:33:18 2026

Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

On 2026-03-30 05:26, Janis Papanagnou wrote:

On 2026-03-28 14:05, DFS wrote:

...

I think one word would've sufficed where you used 235: python

Sorry, I cannot associate that statement with anything I said. -
What is that "235: python" referring to? - Mind to elaborate?

I cannot answer your question, but the way you worded it suggests to me
that you may have parsed his comment incorrectly. It should be parsed as

"I think one word would've sufficed where you used 235. That word is
python."

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Tue Mar 31 09:00:06 2026

Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

On 2026-03-31 02:33, James Kuyper wrote:

On 2026-03-30 05:26, Janis Papanagnou wrote:

On 2026-03-28 14:05, DFS wrote:

...

I think one word would've sufficed where you used 235: python

Sorry, I cannot associate that statement with anything I said. -
What is that "235: python" referring to? - Mind to elaborate?

I cannot answer your question, but the way you worded it suggests to me
that you may have parsed his comment incorrectly.

Indeed. - Thanks.

It should be parsed as

"I think one word would've sufficed where you used 235. That word is
python."

I see.

Obviously any language with decent building blocks would qualify.
It wouldn't have come to my mind that Python is the (only) answer
to explain the characteristics in a generic way. It may serve as
an example, but I cannot expect everyone knowing that language.

But the the post was about this and other useful contests in "C".
(Certainly not about "<language> is better than C".)

Janis

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Tue Mar 31 12:11:22 2026

Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

On Mon, 30 Mar 2026 12:10:46 +0100
Bart <bc@freeuk.com> wrote:

On 30/03/2026 10:26, Janis Papanagnou wrote:

On 2026-03-28 14:05, DFS wrote:

On 3/28/2026 1:59 AM, Janis Papanagnou wrote:
[...]

But okay. When
I read about the tasks to implement the first thought that came
up was to use an appropriate language or tool-set, one that fits
better for the task, tasks that at least I consider annoying to
implement them in "C" because that language doesn't support it
well, because of C's primitivity (its low-level'ness). But okay;
we're in a C-group and the residents need feeding. - Why is it
that I consider it annoying in "C"? - Because I'd have liked to
implement such tasks based on existing _building blocks_; like
associative arrays, sensible array data types, and what not.
Instead of constructing and building a car with tools like an
axe and a stone, wouldn't it be a more sensible to create useful
tools in "C" to make such challenges concentrate more on the
actual problem than on how to reinvent the simplest tasks again
and again? - I'd certainly consider it worthwhile to challenge
implementations of building blocks that alleviate C-programmers
from all the boring error-prone and low-level tasks that are
celebrated ad nauseam. - The question I'd ask myself if faced
with (arbitrary or useful) requirements would be what elementary
functions I'd need to construct the solution. Such identified
and isolated features, i.e. their implementation, would have a
persistent value for more than a single arbitrary "C" challenge.

I think one word would've sufficed where you used 235: python

Sorry, I cannot associate that statement with anything I said. -
What is that "235: python" referring to? - Mind to elaborate?

The 235 refers to the number of words in your paragraph (I haven't
checked).

I did. There are 223 words.
So, now I have more interesting question - how did DFS come to number
235? If by eye sight - it's impressively precise. If by use of word
count utility - it's too imprecise.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Tue Mar 31 11:53:29 2026

Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

On 31/03/2026 10:11, Michael S wrote:

On Mon, 30 Mar 2026 12:10:46 +0100
Bart <bc@freeuk.com> wrote:

On 30/03/2026 10:26, Janis Papanagnou wrote:

On 2026-03-28 14:05, DFS wrote:

On 3/28/2026 1:59 AM, Janis Papanagnou wrote:
[...]

But okay. When
I read about the tasks to implement the first thought that came
up was to use an appropriate language or tool-set, one that fits
better for the task, tasks that at least I consider annoying to
implement them in "C" because that language doesn't support it
well, because of C's primitivity (its low-level'ness). But okay;
we're in a C-group and the residents need feeding. - Why is it
that I consider it annoying in "C"? - Because I'd have liked to
implement such tasks based on existing _building blocks_; like
associative arrays, sensible array data types, and what not.
Instead of constructing and building a car with tools like an
axe and a stone, wouldn't it be a more sensible to create useful
tools in "C" to make such challenges concentrate more on the
actual problem than on how to reinvent the simplest tasks again
and again? - I'd certainly consider it worthwhile to challenge
implementations of building blocks that alleviate C-programmers
from all the boring error-prone and low-level tasks that are
celebrated ad nauseam. - The question I'd ask myself if faced
with (arbitrary or useful) requirements would be what elementary
functions I'd need to construct the solution. Such identified
and isolated features, i.e. their implementation, would have a
persistent value for more than a single arbitrary "C" challenge.

I think one word would've sufficed where you used 235: python

Sorry, I cannot associate that statement with anything I said. -
What is that "235: python" referring to? - Mind to elaborate?

The 235 refers to the number of words in your paragraph (I haven't
checked).

I did. There are 223 words.
So, now I have more interesting question - how did DFS come to number
235? If by eye sight - it's impressively precise. If by use of word
count utility - it's too imprecise.

OK, now I have to count them! If I use 'wc' on the original paragraph
that JP wrote, which starts like this:

Some requirements appear to be quite arbitrary. But okay. ...

Then it says 230 words. But there was also another line before that
paragraph which was this:

Ruminations on the recent "C" challenges...

If that is included, then 'wc' reports 236 words. (It's also possible
that DFS mistyped the value.)

Presumably your count starts from 'But okay;'; then I get 223 words too.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Tue Mar 31 14:22:10 2026

Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

On Tue, 31 Mar 2026 11:53:29 +0100
Bart <bc@freeuk.com> wrote:

On 31/03/2026 10:11, Michael S wrote:

On Mon, 30 Mar 2026 12:10:46 +0100
Bart <bc@freeuk.com> wrote:

On 30/03/2026 10:26, Janis Papanagnou wrote:

On 2026-03-28 14:05, DFS wrote:

On 3/28/2026 1:59 AM, Janis Papanagnou wrote:
[...]

But okay. When
I read about the tasks to implement the first thought that came
up was to use an appropriate language or tool-set, one that fits
better for the task, tasks that at least I consider annoying to
implement them in "C" because that language doesn't support it
well, because of C's primitivity (its low-level'ness). But okay;
we're in a C-group and the residents need feeding. - Why is it
that I consider it annoying in "C"? - Because I'd have liked to
implement such tasks based on existing _building blocks_; like
associative arrays, sensible array data types, and what not.
Instead of constructing and building a car with tools like an
axe and a stone, wouldn't it be a more sensible to create useful
tools in "C" to make such challenges concentrate more on the
actual problem than on how to reinvent the simplest tasks again
and again? - I'd certainly consider it worthwhile to challenge
implementations of building blocks that alleviate C-programmers
from all the boring error-prone and low-level tasks that are
celebrated ad nauseam. - The question I'd ask myself if faced
with (arbitrary or useful) requirements would be what elementary
functions I'd need to construct the solution. Such identified
and isolated features, i.e. their implementation, would have a
persistent value for more than a single arbitrary "C"
challenge.

I think one word would've sufficed where you used 235: python

Sorry, I cannot associate that statement with anything I said. -
What is that "235: python" referring to? - Mind to elaborate?

The 235 refers to the number of words in your paragraph (I haven't
checked).

I did. There are 223 words.
So, now I have more interesting question - how did DFS come to
number 235? If by eye sight - it's impressively precise. If by use
of word count utility - it's too imprecise.

OK, now I have to count them! If I use 'wc' on the original paragraph
that JP wrote, which starts like this:

Some requirements appear to be quite arbitrary. But okay. ...

Then it says 230 words. But there was also another line before that paragraph which was this:

Ruminations on the recent "C" challenges...

If that is included, then 'wc' reports 236 words. (It's also possible
that DFS mistyped the value.)

Presumably your count starts from 'But okay;'; then I get 223 words
too.

Yes, I took paragraph as quoted by DFS. Now I see that in original post
the paragraph was longer.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Tue Mar 31 13:07:48 2026

Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

On 3/31/2026 5:11 AM, Michael S wrote:

I did. There are 223 words.
So, now I have more interesting question - how did DFS come to number
235? If by eye sight - it's impressively precise. If by use of word
count utility - it's too imprecise.

Starting with "But okay. When", I counted on my fingers while moving my
lips. Lost count several times before I dropped it into Notepad++ and
did View | Summary.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Tue Mar 31 21:15:45 2026

Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

On Tue, 31 Mar 2026 13:07:48 -0400
DFS <nospam@dfs.com> wrote:

On 3/31/2026 5:11 AM, Michael S wrote:

I did. There are 223 words.
So, now I have more interesting question - how did DFS come to
number 235? If by eye sight - it's impressively precise. If by use
of word count utility - it's too imprecise.

Starting with "But okay. When", I counted on my fingers while moving
my lips. Lost count several times before I dropped it into Notepad++
and did View | Summary.

Now I know that Notepad++ has View | Summary. Thank you.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Tue Mar 31 16:59:41 2026

Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

On 3/31/2026 2:15 PM, Michael S wrote:

On Tue, 31 Mar 2026 13:07:48 -0400
DFS <nospam@dfs.com> wrote:

On 3/31/2026 5:11 AM, Michael S wrote:

I did. There are 223 words.
So, now I have more interesting question - how did DFS come to
number 235? If by eye sight - it's impressively precise. If by use
of word count utility - it's too imprecise.

Starting with "But okay. When", I counted on my fingers while moving
my lips. Lost count several times before I dropped it into Notepad++
and did View | Summary.

Now I know that Notepad++ has View | Summary. Thank you.

But if I use Notepad++ and replace every space with a \n I get 223
words. Difference of 12. Strange.

Google AI Mode says:

"Notepad++'s View | Summary (or double-clicking the status bar) is known
to produce inaccurate word counts because it uses a simplified algorithm
that often misinterprets punctuation, special characters, and encodings
as word boundaries. It is widely considered "totally wrong" for precise
work.

Recommended Workarounds
For an accurate word count, use these more reliable methods:

Regex Count (Most Accurate):
Press Ctrl + F and go to the Mark or Find tab.
In Find what, type: \w+ (this matches alphanumeric word characters).
Set the Search Mode to Regular expression.
Click Count (or Mark All). The accurate word count will appear in the
status bar of that window.

Counting Selected Text Only:
To count a specific section, highlight the text and follow the Regex
Count steps above, making sure to check the In selection box.

Plugins:
NppTextFX2: This updated plugin provides a dedicated "Word Count" tool
under TextFX > TextFX Tools.

PythonScript: Advanced users can use a script (like StatusBarWordCount)
to display a live, accurate count in the status bar.

Why "Summary" is Inaccurate
Encoding Issues: It may miscount characters in specific encodings like
UCS-2.

Word Definition: Unlike a full word processor, the Summary feature's
basic definition of a "word" often fails to handle contractions (like
"don't") or hyphenated words correctly.

Hidden Spaces: It sometimes overcounts by treating multiple spaces or
line returns as extra word breaks."

Overcounting by 12 from a 223-word paragraph is ridiculously wrong. I'm surprised, since Notepad++ is otherwise a great editor.

Note: if I use the AI suggestion of "Regex Count", it also says 235 words.

223 it is.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Michael S@3:633/10 to All on Wed Apr 1 01:02:11 2026

Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

On Tue, 31 Mar 2026 16:59:41 -0400
DFS <nospam@dfs.com> wrote:

On 3/31/2026 2:15 PM, Michael S wrote:

On Tue, 31 Mar 2026 13:07:48 -0400
DFS <nospam@dfs.com> wrote:

On 3/31/2026 5:11 AM, Michael S wrote:

I did. There are 223 words.
So, now I have more interesting question - how did DFS come to
number 235? If by eye sight - it's impressively precise. If by use
of word count utility - it's too imprecise.

Starting with "But okay. When", I counted on my fingers while
moving my lips. Lost count several times before I dropped it into
Notepad++ and did View | Summary.

Now I know that Notepad++ has View | Summary. Thank you.

But if I use Notepad++ and replace every space with a \n I get 223
words. Difference of 12. Strange.

Google AI Mode says:

"Notepad++'s View | Summary (or double-clicking the status bar) is
known to produce inaccurate word counts because it uses a simplified algorithm that often misinterprets punctuation, special characters,
and encodings as word boundaries. It is widely considered "totally
wrong" for precise work.

Recommended Workarounds
For an accurate word count, use these more reliable methods:

Regex Count (Most Accurate):
Press Ctrl + F and go to the Mark or Find tab.
In Find what, type: \w+ (this matches alphanumeric word characters).
Set the Search Mode to Regular expression.
Click Count (or Mark All). The accurate word count will appear in the
status bar of that window.

Counting Selected Text Only:
To count a specific section, highlight the text and follow the Regex
Count steps above, making sure to check the In selection box.

Plugins:
NppTextFX2: This updated plugin provides a dedicated "Word Count"
tool under TextFX > TextFX Tools.

PythonScript: Advanced users can use a script (like
StatusBarWordCount) to display a live, accurate count in the status
bar.

Why "Summary" is Inaccurate
Encoding Issues: It may miscount characters in specific encodings
like UCS-2.

Word Definition: Unlike a full word processor, the Summary feature's

basic definition of a "word" often fails to handle contractions (like "don't") or hyphenated words correctly.

Hidden Spaces: It sometimes overcounts by treating multiple spaces or
line returns as extra word breaks."

Overcounting by 12 from a 223-word paragraph is ridiculously wrong.
I'm surprised, since Notepad++ is otherwise a great editor.

Note: if I use the AI suggestion of "Regex Count", it also says 235
words.

223 it is.

Now I had unknown that Notepad++ has View | Summary. Thank you.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From DFS@3:633/10 to All on Tue Mar 31 18:24:31 2026

Subject: Re: Ruminations about the recent challenges (was Re: I think this could be an interesting challenge!)

On 3/31/2026 6:02 PM, Michael S wrote:

Now I had unknown that Notepad++ has View | Summary. Thank you.

You're welcome.

But it appears unreliable.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

Who's Online

System Info

Sysop:	Tetrazocine
Location:	Melbourne, VIC, Australia
Users:	14
Nodes:	8 (0 / 8)
Uptime:	93:57:36
Calls:	211
Files:	21,502
Messages:	82,381

Re: I think this could be an interesting challenge!

Who's Online

System Info