NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
My first patch to the Linux kernel (pooladkhay.com)
monus 2 hours ago [-]
> You may wonder whether I tried asking an LLM for help or not. Well, I did. In fact it was very helpful in some tasks like summarizing kernel logs [^13] and extracting the gist of them. But when it came to debugging based on all the clues that were available, it concluded that my code didn't have any bugs, and that the CPU hardware was faulty.

This matches my experience whenever I do an unconventional or deep work like the article mentions. The engineers comfortable with this type of work will multiply their worth.

fonheponho 2 hours ago [-]
Everybody seems to be missing the forest for the trees on this.

There is absolutely no "sign extension" in the C standard (go ahead, search it). "Sign extension" is a feature of some assembly instructions on some architectures, but C has nothing to do with it.

Citing integer promotion from the standard is justified, but it's just one part (perhaps even the smaller part) of the picture. The crucial bit is not quoted in the article: the specification of "Bitwise shift operators". Namely

> The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. [...]

> The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1×2^E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1×2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

What happens here is that "base2" (of type uint8_t, which is "unsigned char" in this environment) gets promoted to "int", and then left-shifted by 24 bits. You get undefined behavior because, while "base2" (after promotion) has a signed type ("int") and nonnegative value, E1×2^E2 (i.e., base2 × 2^24) is NOT representable in the result type ("int").

What happens during the conversion to "uint64_t" afterwards is irrelevant; even the particulars of the sign bit of "int", and how you end up with a negative "int" from the shift, are irrelevant; you got your UB right inside the invalid left-shift. How said UB happens to materialize on this particular C implementation may perhaps be explained in terms of sign extension of the underlying ISA -- but do that separately; be absolutely clear about what is what.

The article fails to mention the root cause (violating the rules for the bitwise left-shift operator) and fails to name the key consequence (undefined behavior); instead, it leads with not-a-thing ("sign-extension bug in C"). I'm displeased.

BTW this bug (invalid left shift of a signed integer) is common, sadly.

Arch-TK 4 minutes ago [-]
[delayed]
adrian_b 54 minutes ago [-]
The root problem is actually that the C language allows implicit conversions from an unsigned type to a signed type and from a signed type to an unsigned type, and in certain contexts such implicit conversions are actually mandated by the standard, like in the buggy expression from the parent article.

It does not matter which is the relationship between the sizes of such types, there will always be values of the operand that cannot be represented in the result.

Saying that the behavior is sometimes undefined is not acceptable. Any implicit conversion of this kind must be an error. Whenever a conversion between signed and unsigned or unsigned and signed is desired, it must be explicit.

This may be the worst mistake that has ever been made in the design of the C language and it has not been corrected even after 50 years.

Making this an error would indeed produce a deluge of error messages in many carelessly written legacy programs, but the program conversion is trivial and it is extremely likely that many of these cases where the compilers do not signal errors can cause bugs in certain corner cases, like in the parent article.

uecker 45 minutes ago [-]
You could just use -Wsign-conversion.
adrian_b 38 minutes ago [-]
Obviously, that should always be used, like also the compiler options for checking integer overflow and accesses out-of-bounds.

However, this kind of implicit conversions must really be forbidden in the standard, because the correct program source is different from the one permitted by the standard.

When you activate most compiler options that detect undefined behaviors, the correct program source remains the same, even if the compiler now implements a better behavior for the translated program than the minimal behavior specified by the standard.

That happens because most undefined behaviors are detected at run time. On the other hand, incorrect implicit conversions are a property of the source code, which is always detected during compilation, so such programs must be rejected.

uecker 32 minutes ago [-]
The standard will not forbid anything that breaks billions of lines of code still be used and maintained.

But it is easy enough to use modern tooling and coding styles to deal with signed overflow. Nowadays, silent unsigned wrap around causing logic errors is the more vexing issue, which indicates the undefined behavior actually helps rather than hurts when used with good tooling.

adrian_b 19 minutes ago [-]
Silent unsigned wrap around is caused by another mistake of the C language (and of all later languages inspired by C), there is only a single unsigned type.

The hardware of modern CPUs actually implements 5 distinct data types that must be declared as "unsigned" in C: non-negative integers, integer residues a.k.a. modular integers, bit strings, binary polynomials and binary polynomial residues.

A modern programming language should better have these 5 distinct types, but it must have at least distinct types for non-negative integers and for integer residues. There are several programming languages that provide at least this distinction. The other data types would be more difficult to support in a high-level language, as they use certain machine instructions that compilers typically do not know how to use.

The change in the C standard that was made so that now "unsigned" means integer residue, has left the language without any means to specify a data type for non-negative integers, which is extremely wrong, because there are more programs that use "unsigned" for non-negative integers than programs that use "unsigned" for integer residues.

The hardware of most CPUs implements very well non-negative integers so non-negative integer overflow is easily detected, but the current standard makes impossible to use the hardware.

vbezhenar 5 seconds ago [-]
I don't understand you. There are unsigned char, unsigned short, unsigned int, unsigned long, unsigned long long. These are separate types. Five of them.
manwe150 2 hours ago [-]
It was implementation defined for shifting negative numbers, but now the standard specifies twos-complement for this and all related IB
uecker 28 minutes ago [-]
While standard requires twos-complement we did not make all shift cases defined so far.
ashwinnair99 6 hours ago [-]
The first one always takes way longer than the code itself deserves. Most of the work is figuring out the unwritten rules, not writing the patch.
fooker 5 hours ago [-]
This is a big problem in open source that seems taboo to discuss.

In my opinion, unwritten rules are for gatekeeping. And if a new person follows all the unwritten rules, magically there's no one willing to review.

I think this is how large BFDL-style open source projects slowly become less and less relevant over the next few decades.

cromka 5 hours ago [-]
Agreed. The level of aggressive gatekeepers is just crazy, take Linux ARM mailing list for example. I found the Central and Eastern Europeans particularly aggressive there and I'm saying this as on myself. They sure do like to feel special there, with very little soft skills.
tossandthrow 3 hours ago [-]
This will likely be alleviated when Ai first projects take over as important OSS projects.

Fir these projects everything "tribal" has to be explicitly codified.

On a more general note: this is likely going to have a rather big impact on software in general - the "engineer to company can not afford to loose" is likely loosing their moat entirely.

4 hours ago [-]
yu3zhou4 6 hours ago [-]
Can confirm that it also happens in other complex systems! Still a lot of good time and the novelty factor helps with pushing through
seb1204 5 hours ago [-]
Sand that after so many years these rules are still not written down.
ngburke 4 hours ago [-]
Sign extension bugs are the worst. Silent for ages then suddenly everything is on fire. Spent a lot of time in C doing low-level firmware work and ran into the same class of issue more than once. Nice writeup, congrats on the patch.
dingensundso 3 hours ago [-]
Nice blogpost. Was an really interesting read. Would be interesting to read about the experience of getting the patch accepted and merged.

One thing I noticed: The last footnote is missing.

foltik 5 hours ago [-]
Well done and great writeup! Any idea why the bug hadn’t shown up sooner, like when running self tests?
ozgrakkurt 2 hours ago [-]
Great blog post. Using _BitInt typedefs for integers is a good option for anyone starting a fresh c project. It has worked well for me so far. _BitInt integers don’t promote to signed automatically like regular integers in c
uecker 43 minutes ago [-]
That makes me worry that you code actually has more issues because with small _BitInt you would run into signed overflow more often.
NotCamelCase 3 hours ago [-]
Lovely article with a happy ending!

One thing that I am glad to have been taught early on in my career when it comes to debugging, especially anything involving HW, is to `make no assumptions'. Bugs can be anywhere and everywhere.

siddyboi 2 hours ago [-]
Huge congrats on tracking that down and getting your first Linux kernel patch merged!
knorker 3 hours ago [-]
Integer promotion rules in C are so deceptive.

I don't believe there's anybody who can reason about them at code skimming speeds. It's probably the best place to hide underhanded code.

yu3zhou4 6 hours ago [-]
Congrats and happy for you, you had a lot of fun and did something genuinely interesting
mbana 6 hours ago [-]
I love these kind of posts.
leontloveless 2 hours ago [-]
[dead]
algolint 4 hours ago [-]
[dead]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 14:06:25 GMT+0000 (Coordinated Universal Time) with Vercel.