Well, I finally got bitten by Unicode.
Managed a work around, but I don't have enough experience
with Unicode to know just exactly what I'm doing...
#include <stdio.h>
#include <string.h>
static int utf8_width(const char *s) {
(len "??????")6
(display-width "??????")12
(coded-length "??????")18
By width do you mean code point count?
This is easily confusable for "display width" which is a concept
of how many columns a Unicode string needs on a monospaced display
or printer.
Kazinator's TXR language:
[...]
const char *s = "‚lan";
printf("string: %s\n", s);
printf("strlen: %d\n", strlen(s)); // 4
printf("utf8_width: %d\n", utf8_width(s)); //5
In article <10f85f9$33pck$1@dont-email.me>,
Michael Sanders <porkchop@invalid.foo> wrote:
const char *s = "‚lan";
printf("string: %s\n", s);
printf("strlen: %d\n", strlen(s)); // 4
printf("utf8_width: %d\n", utf8_width(s)); //5
I think you have those numbers the wrong way round.
Well, I finally got bitten by Unicode.
Managed a work around, but I don't have enough experience
with Unicode to know just exactly what I'm doing...
#include <stdio.h>
#include <string.h>
static int utf8_width(const char *s) {
int w = 0;
const unsigned char *p = (const unsigned char *)s;
while (*p) {
if (*p < 0x80) { w++; p++; } // ASCII 1-byte
else if ((*p & 0xE0) == 0xC0) { w++; p += 2; } // 2-byte UTF-8
else if ((*p & 0xF0) == 0xE0) { w++; p += 3; } // 3-byte UTF-8
else if ((*p & 0xF8) == 0xF0) { w++; p += 4; } // 4-byte UTF-8
else { w++; p++; } // fallback
}
return w;
}
int main(void) {
const char *s = "‚lan";
printf("string: %s\n", s);
printf("strlen: %d\n", strlen(s)); // 4
printf("utf8_width: %d\n", utf8_width(s)); //5
return 0;
}
On Fri, 14 Nov 2025 21:20:43 -0000 (UTC), Kaz Kylheku wrote:
By width do you mean code point count?
This is easily confusable for "display width" which is a concept
of how many columns a Unicode string needs on a monospaced display
or printer.
Well maybe naively I mean the string's length per char...
Well maybe naively I mean the string's length per char...
Can you rephrase that? I can't figure out what "the string's length per char" means.
I haven't really looked at the algorithm, but strlen returns a result
of type size_t, so the correct format in the second printf call is
"%zu", not "%d".
On Fri, 14 Nov 2025 16:12:36 -0800, Keith Thompson wrote:
Well maybe naively I mean the string's length per char...Can you rephrase that? I can't figure out what "the string's length per
char" means.
I just want the length of the string, where each character within that
string equals 1 & I want one way to get the length of any string.
Well, I finally got bitten by Unicode.
Managed a work around, but I don't have enough experience
with Unicode to know just exactly what I'm doing...
#include <stdio.h>
#include <string.h>
static int utf8_width(const char *s) {
int w = 0;
const unsigned char *p = (const unsigned char *)s;
while (*p) {
if (*p < 0x80) { w++; p++; } // ASCII 1-byte
else if ((*p & 0xE0) == 0xC0) { w++; p += 2; } // 2-byte UTF-8
else if ((*p & 0xF0) == 0xE0) { w++; p += 3; } // 3-byte UTF-8
else if ((*p & 0xF8) == 0xF0) { w++; p += 4; } // 4-byte UTF-8
else { w++; p++; } // fallback
}
return w;
}
int main(void) {
const char *s = "‚lan";
printf("string: %s\n", s);
printf("strlen: %d\n", strlen(s)); // 4
printf("utf8_width: %d\n", utf8_width(s)); //5
return 0;
}
Well, I finally got bitten by Unicode.
Managed a work around, but I don't have enough experience
with Unicode to know just exactly what I'm doing...
#include <stdio.h>
#include <string.h>
static int utf8_width(const char *s) {
int w = 0;
const unsigned char *p = (const unsigned char *)s;
while (*p) {
if (*p < 0x80) { w++; p++; } // ASCII 1-byte
else if ((*p & 0xE0) == 0xC0) { w++; p += 2; } // 2-byte UTF-8
else if ((*p & 0xF0) == 0xE0) { w++; p += 3; } // 3-byte UTF-8
else if ((*p & 0xF8) == 0xF0) { w++; p += 4; } // 4-byte UTF-8
else { w++; p++; } // fallback
}
return w;
}
Try this idea written in C++ in C:
size_t utf8Width( span<char>::iterator it )
{
˙ ˙ size_t w = 0;
˙ ˙ for( ; *it; ++w )
˙ ˙ ˙ ˙ if( int head = countl_zero( (unsigned char)~*it ); head <= 3 ) [[likely]]
˙ ˙ ˙ ˙ ˙ ˙ it += head + 1;
˙ ˙ ˙ ˙ else
˙ ˙ ˙ ˙ ˙ ˙ ++it;
˙ ˙ return w;
}
On 2025-11-14 21:03:38 +0000, Michael Sanders said:
Well, I finally got bitten by Unicode.
Managed a work around, but I don't have enough experience
with Unicode to know just exactly what I'm doing...
#include <stdio.h>
#include <string.h>
static int utf8_width(const char *s) {
int w = 0;
const unsigned char *p = (const unsigned char *)s;
while (*p) {
if (*p < 0x80) { w++; p++; } // ASCII 1-byte
else if ((*p & 0xE0) == 0xC0) { w++; p += 2; } // 2-byte UTF-8
else if ((*p & 0xF0) == 0xE0) { w++; p += 3; } // 3-byte UTF-8
else if ((*p & 0xF8) == 0xF0) { w++; p += 4; } // 4-byte UTF-8
else { w++; p++; } // fallback
}
return w;
}
The code above may cause problems if the argument string is not well
formed UTF-8. For example, the zero terminator coud be missed. Of
course an invalid tring can be expected to cause problems anyway but
some errors are harder to debug than others.
Another way is
static int utf8_width(const char *s) {
int w = 0;
const unsigned char *p = (const unsigned char *)s;
while (*p) {
if ((*p & 0xC0) != 0x80) w++; // count the first bytes of each character
}
return w;
}
One could also add a check that each character has the right number of
bytes of the right kind and if not regard that as the end of the string.
I just want the length of the string, where each character within that
string equals 1 & I want one way to get the length of any string.
That sounds exactly like strlen(), unless you mean something else by "character".
Well, I finally got bitten by Unicode.
[...]
A little bugfix and a perfect style:
#include <iostream>
#include <bit>
#include <span>
#include <optional>
using namespace std;
optional<size_t> utf8Width( u8string_view str )
{
˙ ˙ size_t w = 0;
˙ ˙ for( auto it = str.begin(); it != str.end(); ++w ) [[likely]]
˙ ˙ ˙ ˙ if( size_t head = countl_zero( (unsigned char)~*it ); head <= 4
&& (size_t)(str.end() - it) >= head + 1 ) [[likely]]
˙ ˙ ˙ ˙ ˙ ˙ it += head + 1;
˙ ˙ ˙ ˙ else
˙ ˙ ˙ ˙ ˙ ˙ return nullopt;
˙ ˙ return w;
}
int main()
{
˙ ˙ cout << *utf8Width( u8"Hello, ??!" ) << endl;
}
Thanks to everyone for their help.
Fixed the problem & now I surround matched query with <angle brackets>...
On Fri, 14 Nov 2025 18:47:51 -0800, Keith Thompson wrote:
I just want the length of the string, where each character within that
string equals 1 & I want one way to get the length of any string.
That sounds exactly like strlen(), unless you mean something else by
"character".
Thank you Keith, I've gotten it fixed.
Fixed how? I still don't know what you meant by "length of the string".
A little bugfix and a perfect style:
#include <iostream>
#include <bit>
#include <span>
#include <optional>
using namespace std;
optional<size_t> utf8Width( u8string_view str )
{
˙ ˙ size_t w = 0;
˙ ˙ for( auto it = str.begin(); it != str.end(); ++w ) [[likely]]
˙ ˙ ˙ ˙ if( size_t head = countl_zero( (unsigned char)~*it ); head <= 4
&& (size_t)(str.end() - it) >= head + 1 ) [[likely]]
˙ ˙ ˙ ˙ ˙ ˙ it += head + 1;
˙ ˙ ˙ ˙ else
˙ ˙ ˙ ˙ ˙ ˙ return nullopt;
˙ ˙ return w;
}
int main()
{
˙ ˙ cout << *utf8Width( u8"Hello, ??!" ) << endl;
}
On Sat, 15 Nov 2025 12:47:03 +0200, Mikko wrote:
On 2025-11-14 21:03:38 +0000, Michael Sanders said:
Well, I finally got bitten by Unicode.
Managed a work around, but I don't have enough experience
with Unicode to know just exactly what I'm doing...
#include <stdio.h>
#include <string.h>
static int utf8_width(const char *s) {
int w = 0;
const unsigned char *p = (const unsigned char *)s;
while (*p) {
if (*p < 0x80) { w++; p++; } // ASCII 1-byte
else if ((*p & 0xE0) == 0xC0) { w++; p += 2; } // 2-byte UTF-8
else if ((*p & 0xF0) == 0xE0) { w++; p += 3; } // 3-byte UTF-8
else if ((*p & 0xF8) == 0xF0) { w++; p += 4; } // 4-byte UTF-8
else { w++; p++; } // fallback
}
return w;
}
The code above may cause problems if the argument string is not well
formed UTF-8. For example, the zero terminator coud be missed. Of
course an invalid tring can be expected to cause problems anyway but
some errors are harder to debug than others.
Another way is
static int utf8_width(const char *s) {
int w = 0;
const unsigned char *p = (const unsigned char *)s;
while (*p) {
if ((*p & 0xC0) != 0x80) w++; // count the first bytes of each character
}
return w;
}
One could also add a check that each character has the right number of
bytes of the right kind and if not regard that as the end of the string.
Excellent I've added your reply to my notes, thank you Mikko.
static int sort_order = 0; // 0 = sort ascending A-Z, 1 = sort descending Z-A[...]
Well, I finally got bitten by Unicode.
[...]
On Fri, 14 Nov 2025 21:03:38 -0000 (UTC), Michael Sanders wrote:
Well, I finally got bitten by Unicode.
[...]
Smallest Unicode test I can manage. Might prove handy in some contexts:
#include <locale.h>
#include <string.h>
#include <stdio.h>
int got_unicode(void){
char *l = setlocale(LC_CTYPE,"");
return (l && strstr(l,"UTF-8"));
}
#define U(uni, asc) (got_unicode() ? (uni) : (asc))
int main(void){
printf("%s\n", U("Unicode OK: ?", "No Unicode."));
return 0;
}
Could you identify which document guarantees that every Unicode locale contains "UTF-8"? Do you know what the domain of applicability of that document is? It apparently does not cover my Ubuntu Linux system. The
command "locale -a" provides a list of all supported locales. Here's
what it says:
[...]
If you manage an improvement, please do post it here in the group
so I can learn more too.
On Sat, 15 Nov 2025 06:24:39 +0100, Bonita Montero wrote:
A little bugfix and a perfect style:Very nice!
#include <iostream>
#include <bit>
#include <span>
#include <optional>
using namespace std;
optional<size_t> utf8Width( u8string_view str )
{
˙ ˙ size_t w = 0;
˙ ˙ for( auto it = str.begin(); it != str.end(); ++w ) [[likely]]
˙ ˙ ˙ ˙ if( size_t head = countl_zero( (unsigned char)~*it ); head <= 4
&& (size_t)(str.end() - it) >= head + 1 ) [[likely]]
˙ ˙ ˙ ˙ ˙ ˙ it += head + 1;
˙ ˙ ˙ ˙ else
˙ ˙ ˙ ˙ ˙ ˙ return nullopt;
˙ ˙ return w;
}
int main()
{
˙ ˙ cout << *utf8Width( u8"Hello, ??!" ) << endl;
}
On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:
Could you identify which document guarantees that every Unicode locale
contains "UTF-8"? Do you know what the domain of applicability of that
document is? It apparently does not cover my Ubuntu Linux system. The
command "locale -a" provides a list of all supported locales. Here's
what it says:
[...]
Hi James, umm 'guarantees'? No no... It does NOT verify:
- whether the environment actually supports UTF8 fully
- whether multibyte functions are enabled
- whether the terminal supports UTF8
- whether the C library supports UTF8 normalization
(combining characters, etc. but it seems to work well here)
To be sure: It's not a UTF-8 capability test. It's only a
locale-string check. So it likely misses many valid UTF8
locale variants...
Here I'm running any mixture of: Windows/BSD/Linix Mint LMDE.
The best I can tell you at this stage is that it works on my end,
not a very satisfying reply I'm sure you'd agree. But till I learn
more about the issue that's the best I can offer.
If you manage an improvement, please do post it here in the group
so I can learn more too.
On 2025-11-18 15:17, Michael Sanders wrote:
On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:
Could you identify which document guarantees that every Unicode locale contains "UTF-8"? Do you know what the domain of applicability of that document is? It apparently does not cover my Ubuntu Linux system. The command "locale -a" provides a list of all supported locales. Here's
what it says:
[...]
Hi James, umm 'guarantees'? No no... It does NOT verify:
- whether the environment actually supports UTF8 fully
- whether multibyte functions are enabled
- whether the terminal supports UTF8
- whether the C library supports UTF8 normalization
(combining characters, etc. but it seems to work well here)
To be sure: It's not a UTF-8 capability test. It's only a
locale-string check. So it likely misses many valid UTF8
locale variants...
If intended for use by anyone other than yourself, you should document
it's limitations in that regard, either with in-code comments or in user documentation.
Here I'm running any mixture of: Windows/BSD/Linix Mint LMDE.
The best I can tell you at this stage is that it works on my end,
not a very satisfying reply I'm sure you'd agree. But till I learn
more about the issue that's the best I can offer.
If you manage an improvement, please do post it here in the group
so I can learn more too.
There might be documents specifying locale naming standards, but I'm not aware of any. [...]
If your targets include Linux Mint, there's a chance the locale names
might be similar to those on my Ubuntu Linux system - but I'm no expert
on the differences between Linux distributions. If so, you should make
the "UTF" search case-insensitive, and make the '-' optional, which
would add considerable complexity to what is currently a very simple
routine.
[...]
Even cooler. Now the code accepts usual string_views as well as u8string_views.
And if you supply a boolean temlpate parameter before the ()-parameter which is true the data is verified to be a valid UTF-8 string. If you supply false or omit the parameter the string isn't valiedated.
Hi Bonita! These are nice c++/c examples you've provided.
Thanks for your input, I appreciate your code & remarks.
A little bugfix and a perfect style:
#include <iostream>
#include <bit>
#include <span>
#include <optional>
using namespace std;
optional<size_t> utf8Width( u8string_view str )
{
˙ ˙ size_t w = 0;
˙ ˙ for( auto it = str.begin(); it != str.end(); ++w ) [[likely]]
˙ ˙ ˙ ˙ if( size_t head = countl_zero( (unsigned char)~*it ); head <= 4
&& (size_t)(str.end() - it) >= head + 1 ) [[likely]]
˙ ˙ ˙ ˙ ˙ ˙ it += head + 1;
˙ ˙ ˙ ˙ else
˙ ˙ ˙ ˙ ˙ ˙ return nullopt;
˙ ˙ return w;
}
int main()
{
˙ ˙ cout << *utf8Width( u8"Hello, ??!" ) << endl;
}
size_t utf8width(char* s) {
size_t length;
int c, n;
length=0;
while (c=*s) {
if ((c & 0x80) == 0) n = 1;
else if ((c & 0xE0) == 0xC0) n = 2;
else if ((c & 0xF0) == 0xE0) n = 3;
else n = 4;
s += n;
++length;
}
return length;
}
On 15/11/2025 05:24, Bonita Montero wrote:
A little bugfix and a perfect style:
#include <iostream>
#include <bit>
#include <span>
#include <optional>
using namespace std;
optional<size_t> utf8Width( u8string_view str )
{
˙˙ ˙ size_t w = 0;
˙˙ ˙ for( auto it = str.begin(); it != str.end(); ++w ) [[likely]]
˙˙ ˙ ˙ ˙ if( size_t head = countl_zero( (unsigned char)~*it ); head
<= 4 && (size_t)(str.end() - it) >= head + 1 ) [[likely]]
˙˙ ˙ ˙ ˙ ˙ ˙ it += head + 1;
˙˙ ˙ ˙ ˙ else
˙˙ ˙ ˙ ˙ ˙ ˙ return nullopt;
˙˙ ˙ return w;
}
int main()
{
˙˙ ˙ cout << *utf8Width( u8"Hello, ??!" ) << endl;
}
The trouble with this is that I haven't a clue how it works or what
those extras do, or how they impact on performance.
A version in C is given below. This is much more straightforward. It
doesn't verify anything, but then I don't know if yours does either.
As for performance: I duplicated that test string to form one 104
times as long, then called that function one million times. Here are
the timings:
˙ C˙˙ gcc-O2˙˙˙˙ 1.06˙˙ seconds
˙ C˙˙ bcc˙˙˙˙˙˙˙ 1.17˙˙ seconds
˙ C˙˙ tcc˙˙˙˙˙˙˙ 2.81˙˙ seconds
˙ C++ g++-O2˙˙˙˙ 4.6˙˙ seconds
˙ C++ g++-O0˙˙˙ 19˙˙˙˙ seconds
--------------------------
size_t utf8width(char* s) {
˙˙˙ size_t length;
˙˙˙ int c, n;
˙˙˙ length=0;
˙˙˙ while (c=*s) {
˙˙˙˙˙˙˙ if ((c & 0x80) == 0) n = 1;
˙˙˙˙˙˙˙ else if ((c & 0xE0) == 0xC0) n = 2;
˙˙˙˙˙˙˙ else if ((c & 0xF0) == 0xE0) n = 3;
˙˙˙˙˙˙˙ else n = 4;
˙˙˙˙˙˙˙ s += n;
˙˙˙˙˙˙˙ ++length;
˙˙˙ }
˙˙˙ return length;
}
Am 21.11.2025 um 18:03 schrieb bart:
On 15/11/2025 05:24, Bonita Montero wrote:Take a string of a number of UTF-8 characters with a proper
A little bugfix and a perfect style:
#include <iostream>
#include <bit>
#include <span>
#include <optional>
using namespace std;
optional<size_t> utf8Width( u8string_view str )
{
˙˙ ˙ size_t w = 0;
˙˙ ˙ for( auto it = str.begin(); it != str.end(); ++w ) [[likely]]
˙˙ ˙ ˙ ˙ if( size_t head = countl_zero( (unsigned char)~*it ); head
<= 4 && (size_t)(str.end() - it) >= head + 1 ) [[likely]]
˙˙ ˙ ˙ ˙ ˙ ˙ it += head + 1;
˙˙ ˙ ˙ ˙ else
˙˙ ˙ ˙ ˙ ˙ ˙ return nullopt;
˙˙ ˙ return w;
}
int main()
{
˙˙ ˙ cout << *utf8Width( u8"Hello, ??!" ) << endl;
}
The trouble with this is that I haven't a clue how it works or what
those extras do, or how they impact on performance.
A version in C is given below. This is much more straightforward. It
doesn't verify anything, but then I don't know if yours does either.
As for performance: I duplicated that test string to form one 104
times as long, then called that function one million times. Here are
the timings:
˙ C˙˙ gcc-O2˙˙˙˙ 1.06˙˙ seconds
˙ C˙˙ bcc˙˙˙˙˙˙˙ 1.17˙˙ seconds
˙ C˙˙ tcc˙˙˙˙˙˙˙ 2.81˙˙ seconds
˙ C++ g++-O2˙˙˙˙ 4.6˙˙ seconds
˙ C++ g++-O0˙˙˙ 19˙˙˙˙ seconds
--------------------------
size_t utf8width(char* s) {
˙˙˙ size_t length;
˙˙˙ int c, n;
˙˙˙ length=0;
˙˙˙ while (c=*s) {
˙˙˙˙˙˙˙ if ((c & 0x80) == 0) n = 1;
˙˙˙˙˙˙˙ else if ((c & 0xE0) == 0xC0) n = 2;
˙˙˙˙˙˙˙ else if ((c & 0xF0) == 0xE0) n = 3;
˙˙˙˙˙˙˙ else n = 4;
˙˙˙˙˙˙˙ s += n;
˙˙˙˙˙˙˙ ++length;
˙˙˙ }
˙˙˙ return length;
}
mixed chunk-lengths.
This code with AVX512BW and BMI1 is 13,5 times faster than yours on my Zen4-PC.
size_t utf8Width2( const char *s )
{
˙ ˙ __m512i const
˙ ˙ ˙ ˙ ZERO = _mm512_setzero_si512(),
˙ ˙ ˙ ˙ ONE_MASK = _mm512_set1_epi8( (char)0x80 ),
˙ ˙ ˙ ˙ ONE_HEAD = ZERO,
˙ ˙ ˙ ˙ TWO_MASK = _mm512_set1_epi8( (char)0xE0 ),
˙ ˙ ˙ ˙ TWO_HEAD = _mm512_set1_epi8( (char)0xC0 ),
˙ ˙ ˙ ˙ THREE_MASK = _mm512_set1_epi8( (char)0xF0 ),
˙ ˙ ˙ ˙ THREE_HEAD = _mm512_set1_epi8( (char)0xE0 ),
˙ ˙ ˙ ˙ FOUR_MASK = _mm512_set1_epi8( (char)0xF8 ),
˙ ˙ ˙ ˙ FOUR_HEAD = _mm512_set1_epi8( (char)0xF0 );
˙ ˙ uintptr_t
˙ ˙ ˙ ˙ begin = (uintptr_t)s,
˙ ˙ ˙ ˙ base = begin & -64;
˙ ˙ s = (char *)base;
˙ ˙ size_t count = 0;
˙ ˙ __m512i chunk;
˙ ˙ uint64_t nzMask;
˙ ˙ auto doChunk = [&]() L_FORCEINLINE
˙ ˙ {
˙ ˙ ˙ ˙ uint64_t
˙ ˙ ˙ ˙ ˙ ˙ one = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk,
ONE_MASK ), ONE_HEAD ) & nzMask,
˙ ˙ ˙ ˙ ˙ ˙ two = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk,
TWO_MASK ), TWO_HEAD ) & nzMask,
˙ ˙ ˙ ˙ ˙ ˙ three = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, THREE_MASK ), THREE_HEAD ) & nzMask,
˙ ˙ ˙ ˙ ˙ ˙ four = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, FOUR_MASK ), FOUR_HEAD ) & nzMask;
˙ ˙ ˙ ˙ count += _mm_popcnt_u64( one ) + _mm_popcnt_u64( two ) + _mm_popcnt_u64( three ) + _mm_popcnt_u64( four );
˙ ˙ };
˙ ˙ chunk = _mm512_loadu_si512( s );
˙ ˙ unsigned head = (unsigned)(begin - base);
˙ ˙ nzMask = ~_mm512_cmpeq_epi8_mask( chunk, ZERO ) >> head;
˙ ˙ unsigned ones = countr_one( nzMask );
˙ ˙ nzMask &= ones < 64 ? (1ull << ones) - 1 : -1;
˙ ˙ nzMask <<= head;
˙ ˙ doChunk();
˙ ˙ if( (int64_t)nzMask >= 0 )
˙ ˙ ˙ ˙ return count;
˙ ˙ for( ; ; )
˙ ˙ {
˙ ˙ ˙ ˙ s += 64;
˙ ˙ ˙ ˙ chunk = _mm512_loadu_si512( s );
˙ ˙ ˙ ˙ nzMask = ~_mm512_cmpeq_epi8_mask( chunk, ZERO );
˙ ˙ ˙ ˙ ones = countr_one( nzMask );
˙ ˙ ˙ ˙ nzMask = ones < 64 ? (1ull << ones) - 1 : -1;
˙ ˙ ˙ ˙ if( !nzMask )
˙ ˙ ˙ ˙ ˙ ˙ break;
˙ ˙ ˙ ˙ doChunk();
˙ ˙ }
˙ ˙ return count;
}
Doesn't compile, even after I add suitable *intrin headers.
I took out L_FORCEINLINE (not recognised); added std:: to countr_one,
but it still gave me errors like this: C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h: In
lambda function: C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h:42:1:
error: inlining failed in call to 'always_inline' 'long long int _mm_popcnt_u64(long long unsigned int)': target specific option mismatch
˙˙ 42 | _mm_popcnt_u64 (unsigned long long __X)
˙˙˙˙˙ | ^~~~~~~~~~~~~~
You have to give complete compilable code or have only simple
dependencies like stdio.h.
˙ ˙ unsigned ones = countr_one( nzMask );head;
Am 22.11.2025 um 14:38 schrieb bart:
Doesn't compile, even after I add suitable *intrin headers.
I took out L_FORCEINLINE (not recognised); added std:: to countr_one,
but it still gave me errors like this:
C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h: In
lambda function:
C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h:42:1:
error: inlining failed in call to 'always_inline' 'long long int
_mm_popcnt_u64(long long unsigned int)': target specific option mismatch
˙˙ 42 | _mm_popcnt_u64 (unsigned long long __X)
˙˙˙˙˙ | ^~~~~~~~~~~~~~
You have to give complete compilable code or have only simple
dependencies like stdio.h.
Try __attribute__((always_inline)) instead. The code requires enabled
AVX512 compilation
with g++ and a AVX512-compatible CPU (Intel since Skylake-X Xeons, AMD
since Zen4).
If you want to test for an older CPU you can stick with the below code, which is AVX2.
Still doesn't work. I'm using g++ 14.1.0. It doesn't like 'countr_one'-std=c++20
with or without std::
Would it hurt to post a complete, compilable program? Plus theI'm using Visual C++ or clang-cl (MSVC-compatible clang).
compiler invocation if it needs anything unusual.
It only needs a minimal main() routine which I can tweak to my testIt works the same as your code, i.e. it takes a char-pointer.
input. Unless all it needs to use it is a call to utf8Width("abc")
which returns a simple integer.
But ATM my C version is still faster!For sure not that fast as my AVX (seven times) / AVX-512 (13,5 times)
˙ ˙ unsigned ones = countr_one( nzMask );head;
Take this and -mavx512bw and -std=c++23.
#include <iostream>
#include <string_view>
#include <bit>
#include <algorithm>
#include <random>
#include <array>
#include <span>
#include <chrono>
#if defined(_MSC_VER)
˙ ˙ #include <intrin.h>
#elif defined(__GNUC__) || defined(__clang__)
˙ ˙ #include <x86intrin.h>
#endif
#include "inline.h"
On 22/11/2025 15:05, Bonita Montero wrote:
Take this and -mavx512bw and -std=c++23.
#include <iostream>
#include <string_view>
#include <bit>
#include <algorithm>
#include <random>
#include <array>
#include <span>
#include <chrono>
#if defined(_MSC_VER)
˙˙ ˙ #include <intrin.h>
#elif defined(__GNUC__) || defined(__clang__)
˙˙ ˙ #include <x86intrin.h>
#endif
#include "inline.h"
I don't have 'inline.h'. If I comment that out, then I get the errors
below from 'g++ -std=c++23 prog.c', also with -Wno-inline.
Your code seems incredibly fragile.
c.cpp: In function 'size_t utf8Width512(const char*)':
c.cpp:72:37: warning: AVX512F vector return without AVX512F enabled
changes the ABI [-Wpsabi]
˙˙ 72 |˙˙˙˙˙˙˙˙ ZERO = _mm512_setzero_si512(),
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ^
c.cpp: In function 'size_t utf8Width256(const char*)':
c.cpp:123:37: warning: AVX vector return without AVX enabled changes
the ABI [-Wpsabi]
˙ 123 |˙˙˙˙˙˙˙˙ ZERO = _mm256_setzero_si256(),
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ^
In file included from C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/x86gprintrin.h:73,
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ from C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/x86intrin.h:27,
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ from c.cpp:13: C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h: In
lambda function: C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h:42:1:
error: inlining failed in call to 'always_inline' 'long long int _mm_popcnt_u64(long long unsigned int)': target specific option mismatch
˙˙ 42 | _mm_popcnt_u64 (unsigned long long __X)
˙˙˙˙˙ | ^~~~~~~~~~~~~~
c.cpp:95:106: note: called from here
˙˙ 95 |˙˙˙˙˙˙˙˙ count += _mm_popcnt_u64( one ) + _mm_popcnt_u64( two )
+ _mm_popcnt_u64( three ) + _mm_popcnt_u64( four );
˙˙˙˙˙ | ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ~~~~~~~~~~~~~~^~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h:42:1:
error: inlining failed in call to 'always_inline' 'long long int _mm_popcnt_u64(long long unsigned int)': target specific option mismatch
˙˙ 42 | _mm_popcnt_u64 (unsigned long long __X)
˙˙˙˙˙ | ^~~~~~~~~~~~~~
c.cpp:95:80: note: called from here
˙˙ 95 |˙˙˙˙˙˙˙˙ count += _mm_popcnt_u64( one ) + _mm_popcnt_u64( two )
+ _mm_popcnt_u64( three ) + _mm_popcnt_u64( four );
˙˙˙˙˙ | ˙~~~~~~~~~~~~~~^~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h:42:1:
error: inlining failed in call to 'always_inline' 'long long int _mm_popcnt_u64(long long unsigned int)': target specific option mismatch
˙˙ 42 | _mm_popcnt_u64 (unsigned long long __X)
˙˙˙˙˙ | ^~~~~~~~~~~~~~
c.cpp:95:56: note: called from here
˙˙ 95 |˙˙˙˙˙˙˙˙ count += _mm_popcnt_u64( one ) + _mm_popcnt_u64( two )
+ _mm_popcnt_u64( three ) + _mm_popcnt_u64( four );
˙˙˙˙˙ | ~~~~~~~~~~~~~~^~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h:42:1:
error: inlining failed in call to 'always_inline' 'long long int _mm_popcnt_u64(long long unsigned int)': target specific option mismatch
˙˙ 42 | _mm_popcnt_u64 (unsigned long long __X)
˙˙˙˙˙ | ^~~~~~~~~~~~~~
c.cpp:95:32: note: called from here
˙˙ 95 |˙˙˙˙˙˙˙˙ count += _mm_popcnt_u64( one ) + _mm_popcnt_u64( two )
+ _mm_popcnt_u64( three ) + _mm_popcnt_u64( four );
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ~~~~~~~~~~~~~~^~~~~~~
In file included from C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/immintrin.h:65,
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ from C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/x86intrin.h:32: C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512bwintrin.h:1716:1: error: inlining failed in call to 'always_inline' '__mmask64 _mm512_cmpeq_epi8_mask(__m512i, __m512i)': target specific option
mismatch
˙1716 | _mm512_cmpeq_epi8_mask (__m512i __A, __m512i __B)
˙˙˙˙˙ | ^~~~~~~~~~~~~~~~~~~~~~
c.cpp:94:42: note: called from here
˙˙ 94 |˙˙˙˙˙˙˙˙˙˙˙˙ four = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, FOUR_MASK ), FOUR_HEAD ) & nzMask;
˙˙˙˙˙ | ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/immintrin.h:55: C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512fintrin.h:10651:1: error: inlining failed in call to 'always_inline' '__m512i _mm512_and_si512(__m512i, __m512i)': target specific option mismatch
10651 | _mm512_and_si512 (__m512i __A, __m512i __B)
˙˙˙˙˙ | ^~~~~~~~~~~~~~~~
c.cpp:94:42: note: called from here
˙˙ 94 |˙˙˙˙˙˙˙˙˙˙˙˙ four = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, FOUR_MASK ), FOUR_HEAD ) & nzMask;
˙˙˙˙˙ | ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512bwintrin.h:1716:1: error: inlining failed in call to 'always_inline' '__mmask64 _mm512_cmpeq_epi8_mask(__m512i, __m512i)': target specific option
mismatch
˙1716 | _mm512_cmpeq_epi8_mask (__m512i __A, __m512i __B)
˙˙˙˙˙ | ^~~~~~~~~~~~~~~~~~~~~~
c.cpp:93:43: note: called from here
˙˙ 93 |˙˙˙˙˙˙˙˙˙˙˙˙ three = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, THREE_MASK ), THREE_HEAD ) & nzMask,
˙˙˙˙˙ | ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512fintrin.h:10651:1: error: inlining failed in call to 'always_inline' '__m512i _mm512_and_si512(__m512i, __m512i)': target specific option mismatch
10651 | _mm512_and_si512 (__m512i __A, __m512i __B)
˙˙˙˙˙ | ^~~~~~~~~~~~~~~~
c.cpp:93:43: note: called from here
˙˙ 93 |˙˙˙˙˙˙˙˙˙˙˙˙ three = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, THREE_MASK ), THREE_HEAD ) & nzMask,
˙˙˙˙˙ | ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512bwintrin.h:1716:1: error: inlining failed in call to 'always_inline' '__mmask64 _mm512_cmpeq_epi8_mask(__m512i, __m512i)': target specific option
mismatch
˙1716 | _mm512_cmpeq_epi8_mask (__m512i __A, __m512i __B)
˙˙˙˙˙ | ^~~~~~~~~~~~~~~~~~~~~~
c.cpp:92:41: note: called from here
˙˙ 92 |˙˙˙˙˙˙˙˙˙˙˙˙ two = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, TWO_MASK ), TWO_HEAD ) & nzMask,
˙˙˙˙˙ | ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512fintrin.h:10651:1: error: inlining failed in call to 'always_inline' '__m512i _mm512_and_si512(__m512i, __m512i)': target specific option mismatch
10651 | _mm512_and_si512 (__m512i __A, __m512i __B)
˙˙˙˙˙ | ^~~~~~~~~~~~~~~~
c.cpp:92:41: note: called from here
˙˙ 92 |˙˙˙˙˙˙˙˙˙˙˙˙ two = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, TWO_MASK ), TWO_HEAD ) & nzMask,
˙˙˙˙˙ | ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512bwintrin.h:1716:1: error: inlining failed in call to 'always_inline' '__mmask64 _mm512_cmpeq_epi8_mask(__m512i, __m512i)': target specific option
mismatch
˙1716 | _mm512_cmpeq_epi8_mask (__m512i __A, __m512i __B)
˙˙˙˙˙ | ^~~~~~~~~~~~~~~~~~~~~~
c.cpp:91:41: note: called from here
˙˙ 91 |˙˙˙˙˙˙˙˙˙˙˙˙ one = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, ONE_MASK ), ONE_HEAD ) & nzMask,
˙˙˙˙˙ | ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512fintrin.h:10651:1: error: inlining failed in call to 'always_inline' '__m512i _mm512_and_si512(__m512i, __m512i)': target specific option mismatch
10651 | _mm512_and_si512 (__m512i __A, __m512i __B)
˙˙˙˙˙ | ^~~~~~~~~~~~~~~~
c.cpp:91:41: note: called from here
˙˙ 91 |˙˙˙˙˙˙˙˙˙˙˙˙ one = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, ONE_MASK ), ONE_HEAD ) & nzMask,
˙˙˙˙˙ | ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can compile the code with -mavx512bw.
This is "inline.h":
On 22/11/2025 17:13, Bonita Montero wrote:
You can compile the code with -mavx512bw.
This is "inline.h":
But I now get, from:
˙ g++ =std=c++23 -mavx512bw -O2 c.cpp
the errors shown below. I tried -fconcepts too.
So, what also do I need? (So far you're not selling C++ very well!)
On 22/11/2025 17:13, Bonita Montero wrote:
You can compile the code with -mavx512bw.
This is "inline.h":
But I now get, from:
˙ g++ =std=c++23 -mavx512bw -O2 c.cpp
the errors shown below. I tried -fconcepts too.
So, what also do I need? (So far you're not selling C++ very well!)
---------------------------------
c.cpp:33:54: warning: use of C++23 'make_signed_t<size_t>' integer
constant
˙˙ 33 |˙˙˙˙˙˙˙˙˙˙˙˙ if( (*it & 0xC0) == 0x80 || width > min( 4Z, rem )
) [[unlikely]]
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ^~
c.cpp:24:5: error: 'requires' does not name a type
˙˙ 24 |˙˙˙˙ requires std::same_as<View, string_view> ||
std::same_as<View, u8string_view>
˙˙˙˙˙ |˙˙˙˙ ^~~~~~~~
c.cpp:24:5: note: 'requires' only available with '-std=c++20' or '-fconcepts'
c.cpp: In function 'size_t utf8widthC(const char*)':
c.cpp:52:10: error: 'char8_t' was not declared in this scope; did you
mean 'wchar_t'?
˙˙ 52 |˙˙˙˙ for( char8_t c; (c = *str); ++length )
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙ ^~~~~~~
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙ wchar_t
c.cpp:52:22: error: 'c' was not declared in this scope
˙˙ 52 |˙˙˙˙ for( char8_t c; (c = *str); ++length )
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ^
c.cpp: In function 'size_t utf8Width512(const char*)':
c.cpp:99:21: error: 'countr_one' was not declared in this scope
˙˙ 99 |˙˙˙˙ unsigned ones = countr_one( nzMask );
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ^~~~~~~~~~
c.cpp: In function 'size_t utf8Width256(const char*)':
c.cpp:150:21: error: 'countr_one' was not declared in this scope
˙ 150 |˙˙˙˙ unsigned ones = countr_one( nzMask );
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ^~~~~~~~~~
c.cpp: In function 'int main()':
c.cpp:192:5: error: 'span' was not declared in this scope
˙ 192 |˙˙˙˙ span ranges( rawRanges );
˙˙˙˙˙ |˙˙˙˙ ^~~~
c.cpp:192:5: note: 'std::span' is only available from C++20 onwards c.cpp:193:5: error: 'char8_t' was not declared in this scope; did you
mean 'wchar_t'?
˙ 193 |˙˙˙˙ char8_t rawTypeHeads[4] { 0, 0xC0, 0xE0, 0xF0 };
˙˙˙˙˙ |˙˙˙˙ ^~~~~~~
˙˙˙˙˙ |˙˙˙˙ wchar_t
c.cpp:194:9: error: expected ';' before 'typeHeads'
˙ 194 |˙˙˙˙ span typeHeads( rawTypeHeads );
˙˙˙˙˙ |˙˙˙˙˙˙˙˙ ^~~~~~~~~~
˙˙˙˙˙ |˙˙˙˙˙˙˙˙ ;
c.cpp:196:5: error: 'u8string' was not declared in this scope
˙ 196 |˙˙˙˙ u8string u8Str( BUF_MIN + 3, (char8_t)0 );
˙˙˙˙˙ |˙˙˙˙ ^~~~~~~~
c.cpp:196:5: note: 'std::u8string' is only available from C++20 onwards c.cpp:197:20: error: 'u8string' does not name a type
˙ 197 |˙˙˙˙ using u8s_it = u8string::iterator;
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ^~~~~~~~
c.cpp:198:5: error: 'u8s_it' was not declared in this scope
˙ 198 |˙˙˙˙ u8s_it
˙˙˙˙˙ |˙˙˙˙ ^~~~~~
c.cpp:201:30: error: 'itChar' was not declared in this scope
˙ 201 |˙˙˙˙ for( size_t width, type; itChar < itCharEnd; itChar +=
width )
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ^~~~~~
c.cpp:201:39: error: 'itCharEnd' was not declared in this scope
˙ 201 |˙˙˙˙ for( size_t width, type; itChar < itCharEnd; itChar +=
width )
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ^~~~~~~~~
c.cpp:205:23: error: 'ranges' was not declared in this scope; did you
mean 'rawRanges'?
˙ 205 |˙˙˙˙˙˙˙˙ char32_t c = (ranges[type])( mt );
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ^~~~~~
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ rawRanges
c.cpp:206:20: error: expected ';' before 'itTail'
˙ 206 |˙˙˙˙˙˙˙˙ for( u8s_it itTail = itChar + width; --itTail >
itChar; c >>= 6 )
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ^~~~~~~
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ;
c.cpp:206:48: error: 'itTail' was not declared in this scope
˙ 206 |˙˙˙˙˙˙˙˙ for( u8s_it itTail = itChar + width; --itTail >
itChar; c >>= 6 )
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ^~~~~~
c.cpp:208:19: error: 'typeHeads' was not declared in this scope
˙ 208 |˙˙˙˙˙˙˙˙ *itChar = typeHeads[type] | (char8_t)c;
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ^~~~~~~~~
c.cpp:210:5: error: 'u8Str' was not declared in this scope
˙ 210 |˙˙˙˙ u8Str.resize( itChar - u8Str.begin() );
˙˙˙˙˙ |˙˙˙˙ ^~~~~
c.cpp:210:19: error: 'itChar' was not declared in this scope
˙ 210 |˙˙˙˙ u8Str.resize( itChar - u8Str.begin() );
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ^~~~~~
c.cpp:228:25: error: 'u8string' is not a type
˙ 228 |˙˙˙˙ bench( "my: ", [&]( u8string const &str ) { total += utf8Width256( (char *)str.c_str() ); } );
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ^~~~~~~~
c.cpp: In lambda function:
c.cpp:228:84: error: request for member 'c_str' in 'str', which is of non-class type 'const int'
˙ 228 |˙˙˙˙ bench( "my: ", [&]( u8string const &str ) { total += utf8Width256( (char *)str.c_str() ); } );
˙˙˙˙˙ | ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ^~~~~
c.cpp: In function 'int main()':
c.cpp:229:27: error: 'u8string' is not a type
˙ 229 |˙˙˙˙ bench( "nerd: ", [&]( u8string const &str ) { total += utf8widthC( (char *)str.c_str() ); } );
˙˙˙˙˙ |˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ^~~~~~~~
c.cpp: In lambda function:
c.cpp:229:84: error: request for member 'c_str' in 'str', which is of non-class type 'const int'
˙ 229 |˙˙˙˙ bench( "nerd: ", [&]( u8string const &str ) { total += utf8widthC( (char *)str.c_str() ); } );
˙˙˙˙˙ | ˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ ^~~~~
A lot of errors look like that you haven't enable at C++23 properly.
Can you install a current g++ ? Maybe the newest from the repository
is sufficient.
On 22/11/2025 17:44, Bonita Montero wrote:
A lot of errors look like that you haven't enable at C++23 properly.
Can you install a current g++ ? Maybe the newest from the repository
is sufficient.
I said in a followup that I'd typed =std instead of -std, which didn't generate any error from the compiler.
But I managed to compile it. However the long program with a
complicated main() just crashed trying to run it, sometime before it
got to the actual UTF8 bit.
So I applied those headers and options to the first mm512
single-function version you posted. There I only had to add std:: to
those countr.one's.
I used this test driver
˙ int main() {
˙˙˙˙˙ size_t n = 0;
˙˙˙˙˙ n = utf8Width("Hello, ??!" );
˙˙˙˙˙ printf("%zu\n", n);
˙ }
And it crashes inside that function.
It's all just too damn complicated, sorry. It might well be fast, but
that's no good if it is troublesome to build and run for anyone else.
Another factor is this: each build, even at -O0, takes 3 whole seconds
on my machine. That must be a huge pile of junk it is including.
Building my C version takes some 1/20th of a second (even gcc takes
only 0.3 seconds).
On 22/11/2025 17:35, bart wrote:[...]
On 22/11/2025 17:13, Bonita Montero wrote:
You can compile the code with -mavx512bw.But I now get, from:
This is "inline.h":
˙ g++ =std=c++23 -mavx512bw -O2 c.cpp
the errors shown below. I tried -fconcepts too.
So, what also do I need? (So far you're not selling C++ very well!)
Wait, there's a "=std" in that command line instead of
"-std". Apparently it is not an error (?).
bart <bc@freeuk.com> writes:
On 22/11/2025 17:35, bart wrote:[...]
On 22/11/2025 17:13, Bonita Montero wrote:
You can compile the code with -mavx512bw.But I now get, from:
This is "inline.h":
˙ g++ =std=c++23 -mavx512bw -O2 c.cpp
the errors shown below. I tried -fconcepts too.
So, what also do I need? (So far you're not selling C++ very well!)
Wait, there's a "=std" in that command line instead of
"-std". Apparently it is not an error (?).
It seems that gcc and g++ interpret any unrecognized command line
argument as the name of a "linker input file".
BTW, comp.lang.c++ is down the hall, just past the water cooler.
static int utf8_width(const char *s) {Do you need this to work under non-UTF-8 locales? If you only need that
int w = 0;
const unsigned char *p = (const unsigned char *)s;
while (*p) {
if (*p < 0x80) { w++; p++; } // ASCII 1-byte
else if ((*p & 0xE0) == 0xC0) { w++; p += 2; } // 2-byte UTF-8
else if ((*p & 0xF0) == 0xE0) { w++; p += 3; } // 3-byte UTF-8
else if ((*p & 0xF8) == 0xF0) { w++; p += 4; } // 4-byte UTF-8
else { w++; p++; } // fallback
}
return w;
}
On 22/11/2025 23:24, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 22/11/2025 17:35, bart wrote:[...]
On 22/11/2025 17:13, Bonita Montero wrote:
You can compile the code with -mavx512bw.But I now get, from:
This is "inline.h":
˙ ˙ g++ =std=c++23 -mavx512bw -O2 c.cpp
the errors shown below. I tried -fconcepts too.
So, what also do I need? (So far you're not selling C++ very well!)
Wait, there's a "=std" in that command line instead of
"-std". Apparently it is not an error (?).
It seems that gcc and g++ interpret any unrecognized command line
argument as the name of a "linker input file".
It looks like it compiles any source code first, so won't get around to reporting an error if that compilation fails.
BTW, comp.lang.c++ is down the hall, just past the water cooler.
This was supposed be about comparing a C approach to C++. Except there
were problems in getting the 'fast' C++ code to compile and then to run.
I think I'll stick with the simple C version which can also be trivially ported to any language as there are no heavy dependencies.
Do you need this to work under non-UTF-8 locales? If you only need that length when the locale is UTF-8, why not just use mblen from stdlib.h?
| Sysop: | Tetrazocine |
|---|---|
| Location: | Melbourne, VIC, Australia |
| Users: | 14 |
| Nodes: | 8 (0 / 8) |
| Uptime: | 238:00:05 |
| Calls: | 184 |
| Files: | 21,502 |
| Messages: | 82,427 |