Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Glassworm Is Back: A New Wave of Invisible Unicode Attacks Hits Repositories (aikido.dev)

86 points by robinhouston 5 hours ago | 34 comments

btown 1 hours ago [-]

IMO while the bar is high to say "it's the responsibility of the repository operator itself to guard against a certain class of attack" - I think this qualifies. The same way GitHub provides Secret Scanning [0], it should alert upon spans of zero-width characters that are not used in a linguistically standard way (don't need an LLM for this, just n-tuples).

Sure, third-party services like the OP can provide bots that can scan. But if you create an ecosystem in which PRs can be submitted by threat actors, part of your commitment to the community should be to provide visibility into attacks that cannot be seen by the naked eye, and make that protection the norm rather than the exception.

[0] https://docs.github.com/en/get-started/learning-about-github...

andrewflnr 1 hours ago [-]

Regardless of the thorny question of whether it's Github's responsibility, it sure would be a good thing for them to do ASAP.

jacquesm 22 minutes ago [-]

It absolutely is. They are simply spreading malware. You can't claim to be a 'dumb pipe' when your whole reason for existence is to make something people deemed 'too complex' simple enough for others to use, then you have an immediate responsibility to not only reduce complexity but to also ensure safety. Dumbing stuff down comes with a duty of care.

WalterBright 34 minutes ago [-]

Unicode should be for visible characters. Invisible characters are an abomination. So are ways to hide text by using Unicode so-called "characters" to cause the cursor to go backwards.

Things that vanish on a printout should not be in Unicode.

Remove them from Unicode.

pvillano 11 minutes ago [-]

Unicode is "designed to support the use of text in all of the world's writing systems that can be digitized"

Unicode needs tab, space, form feed, and carriage return.

Unicode needs U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT MARK to switch between left-to-right and right-to-left languages.

Unicode needs U+115F HANGUL CHOSEONG FILLER and U+1160 HANGUL JUNGSEONG FILLER to typeset Korean.

Unicode needs U+200C ZERO WIDTH NON-JOINER to encode that two characters should not be connected by a ligature.

Unicode needs U+200B ZERO WIDTH SPACE to indicate a word break opportunity without actually inserting a visible space.

Unicode needs MONGOLIAN FREE VARIATION SELECTORs to encode the traditional Mongolian alphabet.

uhoh-itsmaciek 3 minutes ago [-]

>Remove them from Unicode.

Do you honestly think this is a workable solution?

WalterBright 24 minutes ago [-]

Another dum dum Unicode idea is having multiple code points with identical glyphs.

Rule of thumb: two Unicode sequences that look identical when printed should consist of the same code points.

nswango 11 minutes ago [-]

So you think that the letters in the Greek and Cyrillic alphabets which are printed identically to the Latin A should not exist?

And, for example, Greek words containing this letter should be encoded with a mix of Latin and Greek characters?

jeltz 13 minutes ago [-]

I don't think that would help much. There are also characters which are similar but not the same and I don't think humans can spot the differences unless they are actively looking for them which most of the time people are not. If only one of two glyphs which are similar appear in the text nobody would likely notice, expectation bias will fuck you over.

moritzruth 32 minutes ago [-]

greatidea,whoneedsspacesanyway

WalterBright 30 minutes ago [-]

Spaces appear on a printout.

abujazar 32 minutes ago [-]

Invisible characters are there for visible characters to be printed correctly...

minus7 2 hours ago [-]

The `eval` alone should be enough of a red flag

jeltz 19 minutes ago [-]

Yeah, I would have loved to see an example where it was not obvious that there is an exploit. Where it would be possible for a reviewer to actually miss it.

kordlessagain 1 hours ago [-]

No it’s not.

jacquesm 21 minutes ago [-]

While there are valid use cases for eval they are so rare that it should be disabled and strongly discouraged as a pattern. Only in very rare cases is eval the right choice and even then it will be fraught with risk.

simonreiff 53 minutes ago [-]

OWASP disagrees: See https://cheatsheetseries.owasp.org/cheatsheets/Nodejs_Securi..., listing `eval()` first in its small list of examples of "JavaScript functions that are dangerous and should only be used where necessary or unavoidable". I'm unaware of any such uses, myself. I can't think of any scenario where I couldn't get what I wanted by using some combination of `vm`, the `Function` constructor, and a safe wrapper around `JSON.parse()` to do anything I might have considered doing unsafely with `eval()`. Yes, `eval()` is a blatant red flag and definitely should be avoided.

pavel_lishin 1 hours ago [-]

When is an eval not at least a security "code smell"?

SahAssar 1 hours ago [-]

It really is. There are very few proper use-cases for eval.

nswango 9 minutes ago [-]

For a long time the standard way of loading JSON was using eval.

gnabgib 2 hours ago [-]

Small discussion yesterday (9+9 points, 9+4 comments) https://news.ycombinator.com/item?id=47374479 https://news.ycombinator.com/item?id=47385244

tolciho 29 minutes ago [-]

Attacks employing invisible characters are not a new thing. Prior efforts here include terminal escape sequences, possibly hidden with CSS that if blindly copied and pasted would execute who knows what if the particular terminal allowed escape sequences to do too much (a common feature of featuritis) or the terminal had errors in its invisible character parsing code.

For data or code hiding the Acme::Bleach Perl module is an old example though by no means the oldest example of such. This is largely irrelevant given how relevant not learning from history is for most.

Invisible characters may also cause hard to debug issues, such as lpr(1) not working for a user, who turned out to have a control character hiding in their .cshrc. Such things as hex viewers and OCD levels of attention to detail are suggested.

vitus 45 minutes ago [-]

Looks like the repo owner force-pushed a bad commit to replace an existing one. But then, why not forge it to maintain the existing timestamp + author, e.g. via `git commit --amend -C df8c18`?

Innocuous PR (but do note the line about "pedronauck pushed a commit that referenced this pull request last week"): https://github.com/pedronauck/reworm/pull/28

Original commit: https://github.com/pedronauck/reworm/commit/df8c18

Amended commit: https://github.com/pedronauck/reworm/commit/d50cd8

Either way, pretty clear sign that the owner's creds (and possibly an entire machine) are compromised.

chrismorgan 16 minutes ago [-]

The value of the technique, I suppose, is that it hides a large payload a bit better. The part you can see stinks (a bunch of magic numbers and eval), but I suppose it’s still easier to overlook than a 9000-character line of hexadecimal (if still encoded or even decoded but still encrypted) or stuff mentioning Solana and Russian timezones (I just decoded and decrypted the payload out of curiosity).

But really, it still has to be injected after the fact. Even the most superficial code review should catch it.

ocornut 51 minutes ago [-]

It baffles me that any maintainer would merge code like the one highlighted in the issue, without knowing what it does. That’s regardless of being or not being able to see the “invisible” characters. There’s a transforming function here and an eval() call.

The mere fact that a software maintainer would merge code without knowing what it does says more about the terrible state of software.

DropDead 3 hours ago [-]

Why didn't some make av rule to find stuff like this, they are just plain text files

nine_k 2 hours ago [-]

The rule must be very simple: any occurrence of `eval()` should be a BIG RED FLAG. It should be handled like a live bomb, which it is.

Then, any appearance of unprintable characters should also be flagged. There are rather few legitimate uses of some zero-width characters, like ZWJ in emoji composition. Ideally all such characters should be inserted as \xNNNN escape sequences, and not literal characters.

Simple lint rules would suffice for that, with zero AI involvement.

hamburglar 15 minutes ago [-]

I think there’s debate (which I don’t want to participate in) over whether or not invisible characters have their uses in Unicode. But I hope we can all agree that invisible characters have no business in code, and banishing them is reasonable.

WalterBright 31 minutes ago [-]

> There are rather few legitimate uses of some zero-width characters, like ZWJ in emoji composition.

Emojis are another abomination that should be removed from Unicode. If you want pictures, use a gif.

trollbridge 1 hours ago [-]

In our repos, we have some basic stuff like ruff that runs, and that includes a hard error on any Unicode characters. We mostly did this after some un-fun times when byte order marks somehow ended up in a file and it made something fail.

I have considered allowing a short list that does not include emojis, joining characters, and so on - basically just currency symbols, accent marks, and everything else you'd find in CP-1521 but never got around to it.

abound 2 hours ago [-]

Yeah it would have been nice to end with "and here's a five-line shell script to check if your project is likely affected". But to their credit, they do have an open-source tool [1], I'm just not willing to install a big blob of JavaScript to look for vulns in my other big blobs of JavaScript

[1] https://github.com/AikidoSec/safe-chain

nine_k 1 hours ago [-]

Something like this should work, assuming your encoding is Unicode (normally UTF-8), which grep would interpret:

  grep -P '[\x{200B}\x{200C}\x{200D}\x{FEFF}]' code.ts

See https://stackoverflow.com/q/78129129/223424

faangguyindia 1 hours ago [-]

Back in time I was on hacking forums where lot of script kiddies used to make malicious code.

I am wondering how that they've LLM, are people using them for making new kind of malicious codes more sophisticated than before?

Yokohiii 49 minutes ago [-]

In this case LLMs were obviously used to dress the code up as more legitimate, adding more human or project relevant noise. It's social engineering, but you leave the tedious bits to an LLM. The sophisticated part is the obscurity in the whole process, not the code.

aplomb1026 14 minutes ago [-]

[dead]

aneyadeng 39 minutes ago [-]

[dead]

robutsume 2 hours ago [-]

[dead]

Rendered at 17:45:48 GMT+0000 (Coordinated Universal Time) with Vercel.