NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Four Column ASCII (2017) (garbagecollected.org)
HocusLocus 4 hours ago [-]
I have lived my whole professional life with this being 'beyond obvious'... It's hard to imagine a generation where it's not. But then again, I did work with EBCDIC for awhile and we were reading and translating ASCII log tapes (ITT/Alcatel 1210 switch, phone calls, memory dumps).

I once got drunk with my elderly unix supernerd friend and he was talking about TTYs and how his passwords contained embedded ^S and ^Q characters and he traced the login process to learn they were just stalling the tty not actually used to construct the hash. No one else at the bar got the drift. He patched his system to put do 'raw' instead of 'cooked' mode for login passwords. He also used backspaces ^? ^H as part of his passwords. He was a real security tiger. I miss him.

dcminter 5 hours ago [-]
It doesn't seem to have been mentioned in the comments so far, but as a floppy-disk era developer I remember my mind was blown by the discovery that DEL was all-bits-set because this allowed a character on paper tape and punched card to be deleted by punching any un-punched holes!
axblount 2 hours ago [-]
Bit-level skeuomorphism! And since NUL is zero, does that mean the program ends wherever you stop punching? I've never used punch cards so I don't know how things were organized.
fix4fun 12 hours ago [-]
For me was interesting that all digits in ASCII starts with 0x3, eg. 0x30 - 0, 0x31 - 1, ..., 0x39 - 9. I thought it was accidental, but in real it was intended. This was giving possibility to build simple counting/accounting machines with minimal circuit logic with BCD (Binary Coded Decimals). That was wow for me ;)
satiated_grue 3 hours ago [-]
ASCII was started in 1960. A terminal then would have been a mostly-mechanical teletype (keyboard and printer, possibly with paper tape reader/punch), without much by way of "circuit logic". Think of it more as a bit caused a physical shift of a linkage to do something like hit the upper or lower part of a hammer, or a separate set of hammers for the same remaining bits.

Look at the Teletype ASR-33, introduced in 1963.

kibwen 4 hours ago [-]
I still wonder if it wouldn't have been better to let each digit be represented by its exact value, and then use the high end of the scale rather than the low end for the control characters. I suppose by 1970 they were already dealing with the legacy of backwards-compatibility, and people were already accustomed to 0x0 meaning something akin to null?
mmilunic 4 hours ago [-]
Either way you would still need some check to ensure your digits are digits and not some other type of character. Having zeroed out memory read as a bunch of NUL characters instead of like “00000000” would probably be useful, as “000000” is sometimes a legitimate user input
gpvos 3 hours ago [-]
NUL was often sent as padding to slow (printing) terminals. Although that was just before my time.
zahlman 11 hours ago [-]
And this is exactly why I find the usual 16x8 at least as insightful as this proposed 32x4 (well, 4x32, but that's just a rotation).
iamgrootali 7 hours ago [-]
[flagged]
kazinator 12 hours ago [-]
This is by design, so that case conversion and folding is just a bit operation.

The idea that SOH/1 is "Ctrl-A" or ESC/27 is "Ctrl-[" is not part of ASCII; that idea comes from they way terminals provided access to the control characters, by a Ctrl key that just masked out a few bits.

muyuu 9 hours ago [-]
I guess it's an age thing, but I thought this was really basic CS knowledge. But I can see why this may be much less relevant nowadays.
Cthulhu_ 6 hours ago [-]
I've been in IT for decades but never knew that ctrl was (as easy as) masking some bits.
kazinator 9 minutes ago [-]
[delayed]
muyuu 6 hours ago [-]
You can go back maybe 2 decades without this being very relevant, but not 3 given the low level scope that was expected in CS and EE back then.
aa-jv 6 hours ago [-]
Been an ASCII-naut since the 80's, so .. its always amusing to see people type 'man ascii' for the first time, gaze upon its beauty, and wonder at its relevance, even still today ...
nine_k 11 hours ago [-]
Yes, the diagram just shows the ASCII table for the old teletype 6-bit code (and 5-bit code before), with the two most significant bits spread over 4 columns to show the extension that happened while going 5→6→7 bits. It makes obvious what was very simple bit operations on very limited hardware 70–100 years ago.

(I assume everybody knows that on mechanical typewriters and teletypes the "shift" key physically shifted the caret position upwards, so that a different glyph would be printed when hit by a typebar.)

california-og 2 hours ago [-]
I made an interactive viewer some time ago (scroll down a bit):

https://blog.glyphdrawing.club/the-origins-of-del-0x7f-and-i...

It really helps understand the logic of ASCII.

jez 2 hours ago [-]
I have a command called `ascii-4col.txt` in my personal `bin/` folder that prints this out:

https://github.com/jez/bin/blob/master/ascii-4col.txt

It's neat because it's the only command I have that uses `tail` for the shebang line.

taejavu 12 hours ago [-]
For whatever reason, there are extraordinarily few references that I come back to over and over, across the years and decades. This is one of them.
taejavu 12 hours ago [-]
Tangentially related, there is much insight about Unix idioms to be gained from understanding the key layout of the terminal Bill Joy used to create vi

https://news.ycombinator.com/item?id=21586980

aa-jv 6 hours ago [-]
Not 'man ascii'?
mbreese 5 hours ago [-]
I came across this a week ago when I was looking at some LLM generated code for a ToUpper() function. At some point I “knew” this relationship, but I didn’t really “grok” it until I read a function that converted lowercase ascii to uppercase by using a bitwise XOR with 0x20.

It makes sense, but it didn’t really hit me until recently. Now, I’m wondering what other hidden cleverness is there that used to be common knowledge, but is now lost in the abstractions.

Findecanor 4 hours ago [-]
A similar bit-flipping trick was used to swap between numeric row + symbol keys on the keyboard, and the shifted symbols on the same keys. These bit-flips made it easier to construct the circuits for keyboards that output ASCII.

I believe the layout of the shifted symbols on the numeric row were based on an early IBM Selectric typewriter for the US market. Then IBM went and changed it, and the latter is the origin of the ANSI keyboard layout we have now.

auselen 4 hours ago [-]
xor should toggle?
munk-a 4 hours ago [-]
That's correct, a toUpper would just use OR.
mbreese 2 hours ago [-]
I left out that the line before there was a check to make sure the input byte was between ‘a’ and ‘z’. This ensures that if the char is already upper case, you don’t do an extraneous OR. And at this point, OR, XOR, or even a subtract 0x20 would work. For some reason the LLM thought the XOR was faster.

I honestly wouldn’t have thought anything of it if I hadn’t seen it written as `b ^ 0x20`.

pixelbeat__ 11 hours ago [-]
Some of this elegance discussed from a programmatic point of view

https://www.pixelbeat.org/docs/utf8_programming.html

gpvos 3 hours ago [-]
Back in early times, I used to type ctrl-M in some situations because it could be easier to reach than the return key, depending on what I was typing.
seyz 6 hours ago [-]
This is why Ctrl+C is 0x03 and Ctrl+G is the bell. The columns aren't arbitrary. They're the control codes with bit 6 flipped. Once you see it, you can't unsee it. Best ASCII explainer I've read.
dveeden2 12 hours ago [-]
Also easy to see why Ctrl-D works for exiting sessions.
rbanffy 2 days ago [-]
This is also why the Teletype layout has parentheses on 8 and 9 unlike modem keyboards that have them on 9 and 0 (a layout popularised by the IBM Selectric). The original Apple IIs had this same layout, with a “bell” on top of the G.
spragl 10 hours ago [-]
Modern keyboards = some keyboards. In the Nordic Countries modern keyboards have parantheses on 8 and 9.
debugnik 7 hours ago [-]
According to the layouts on this site, there're more European layouts with parenthesis on 8, 9 than on 9, 0. (I had to zoom out to see the right-side of the comparisons.)

https://www.farah.cl/Keyboardery/A-Visual-Comparison-of-Diff...

Terretta 2 days ago [-]
What happened to this block and the keyboard key arrangement?

  ESC  [  {  11011
  FS   \  |  11100
  GS   ]  }  11101
Also curious why the keys open and close braces, but ... the single and double curly quotes don't open and close, but are stacked. Seems nuts every time I type Option-{ and Option-Shift-{ …
kazinator 12 hours ago [-]
You're no longer talking about ASCII. ASCII has only a double quote, apostrophe (which doubles as a single quote) and backtick/backquote.

Note on your Mac that the Option-{ and Option-}, with and without Shift, produce quotes which are all distinct from the characters produced by your '/" key! They are Unicode characters not in ASCII.

In the ASCII standard (1977 version here: https://nvlpubs.nist.gov/nistpubs/Legacy/FIPS/fipspub1-2-197...) the example table shows a glyph for the double quote which is vertical: it is neither an opening nor closing quote.

The apostrophe is shown as a closing quote, by slanting to the right; approximately a mirror image of the backtick. So it looks as though those two are intended to form an opening and closing pair. Except, in many terminal fonts, the apostrophe is a just vertical tick, like half of a double quote.

The ' being veritcal helps programming language '...' literals not look weird.

jolmg 9 hours ago [-]
> What happened to this block and the keyboard key arrangement?

There's also these:

  | ASCII      | US keyboard |
  |------------+-------------|
  | 041/0x21 ! | 1 !         |
  | 042/0x22 " | 2 @         |
  | 043/0x23 # | 3 #         |
  | 044/0x24 $ | 4 $         |
  | 045/0x25 % | 5 %         |
  |            | 6 ^         |
  | 046/0x26 & | 7 &         |
9 hours ago [-]
dang 12 hours ago [-]
Related. Others?

Four Column ASCII (2017) - https://news.ycombinator.com/item?id=21073463 - Sept 2019 (40 comments)

Four Column ASCII - https://news.ycombinator.com/item?id=13539552 - Feb 2017 (68 comments)

joshcorbin 41 minutes ago [-]
Just wait until someone finally gets why CSI ( aka the “other escape” from the 8-bit ansi realm, which is now eternalized in unicode C1 block ) is written ESC [ in 7-bit systems, such as the equally now eternal utf-8 encoding
unnah 11 hours ago [-]
If Ctrl sets bit 6 to 0, and Shift sets bit 5 to 1, the logical extension is to use Ctrl and Shift together to set the top bits to 01. Surely there must be a system somewhere that maps Ctrl-Shift-A to !, Ctrl-Shift-B to " etc.
maybewhenthesun 10 hours ago [-]
It's more that shift flips that bit. Also I'd call them bit 0 and 1 and not 5 and 6 as 'normally' you count bits from the right (least significant to most significant). But there are lots of differences for 'normal' of course ('middle endian' :-P )
Leszek 11 hours ago [-]
I guess in this system, you'd also type lowercase letters by holding shift?
ezekiel68 6 hours ago [-]
I love this stuff. It's the kind of lore that keeps getting forgotten and re-discovered by swathes of curious computer scientists over the years. So easy to assume many of the old artifacts (such as the ASCII table) had no rhyme or reason to them.
mac3n 4 hours ago [-]
credit to William Crosby, "Note on an ASCII-Octal Code Table", CACM 8.10, Oct 1965

https://dl.acm.org/doi/epdf/10.1145/365628.365652

also defined 6-bit ASCII subset

mac3n 3 hours ago [-]
anyone remember 005 ENQ (also called WRU who are you) and its effect on a teletype?
renox 11 hours ago [-]
I still find weird that they didn't make A,B... just after the digits, that would make binary to hexadecimal conversion more efficient..
iguessthislldo 10 hours ago [-]
Going off the timelines on Wikipedia, the first version of ASCII was published (1963) before the 0-9,A-F hex notation became widely used (>=1966):

- https://en.wikipedia.org/wiki/ASCII#History

- https://en.wikipedia.org/wiki/Hexadecimal#Cultural_history

jolmg 10 hours ago [-]
The alphanumeric codepoints are well placed hexadecimally-speaking though. I don't imagine that was just an accident. For example, they could've put '0' at 050/0x28, but they put it at 060/0x30. That seems to me that they did have hexadecimal in consideration.
kubanczyk 9 hours ago [-]
It's a binary consideration if you think of it rather than hexadecimal.

If you have to prominently represent 10 things in binary, then it's neat to allocate slot of size 16 and pad the remaining 6 items. Which is to say it's neat to proceed from all zeroes:

    x x x x 0 0 0 0
    x x x x 0 0 0 1
    x x x x 0 0 1 0
    ....
    x x x x 1 1 1 1
It's more of a cause for hexadecimal notation than an effect of it.
jolmg 10 hours ago [-]
Currently 'A' is 0x41 and 0101, 'a' is 0x61 and 0141, and '0' is 0x30 and 060. These are fairly simple to remember for converting between alphanumerics and their codepoint. Seems more advantageous, especially if you might be reasonably looking at punchcards.
tgv 9 hours ago [-]
[0-9A-Z] doesn't fit in 5 bits, which impedes shift/ctrl bits.
vanderZwan 10 hours ago [-]
I'm not sure if our convention for hexadecimal notation is old enough to have been a consideration.

EDIT: it would need to predate the 6-bit teletype codes that preceded ASCII.

kps 4 hours ago [-]
They put : ; immediately after the digits because they were considered the least used of the major punctuation, so that they could be replaced by ‘digits’ 10 and 11 where desired.

(I'm almost reluctant to to spoil the fun for the kids these days, but https://en.wikipedia.org/wiki/%C2%A3sd )

meken 5 hours ago [-]
Very cool.

Though the 01 column is a bit unsatisfying because it doesn’t seem to have any connection to its siblings.

y42 5 hours ago [-]
first I was like "What but why? You don't save any space or what's that excercise about" then I read it again and it blew my mind. I thought I knew everything about ASCII. What a fool I am, Sokrates was right. Always.
msarnoff 10 hours ago [-]
On early bit-paired keyboards with parallel 7-bit outputs, possibly going back to mechanical teletypes, I think holding Control literally tied the upper two bits to zero. (citation needed)

Also explains why there is no difference between Ctrl-x and Ctrl-Shift-x.

2 days ago [-]
SUDEEPSD25 2 hours ago [-]
Love this!
10 hours ago [-]
9 hours ago [-]
timonoko 12 hours ago [-]
where does this character set come from? It looks different on xterm.

for x in range(0x0,0x20): print(chr(x),end=" ")

                    

voxelghost 11 hours ago [-]
What are you trying to achieve, none of those characters are printable, and definetly not going to show up on the web.

    for x in range(0x0,0x20): print(f'({chr(x)})', end =' ')
    (0|) (1|) (2|) (3|) (4|) (5|) (6|) (7|) (8) (9| ) (10|
    ) (11|
          ) (12|
    ) (14|) (15|) (16|) (17|) (18|) (19|) (20|) (21|) (22|) (23|) (24|) (25|)    (26|␦) (27|8|) (29|) (30|) (31|)
timonoko 11 hours ago [-]
Just asking why they have different icons in different environments? Maybe it is UTF-8 vs ISO-8859?
rbanffy 10 hours ago [-]
They shouldn't show as visual representations, but some "ASCII" charts show the IBM PC character set instead of the ASCII set. IIRC, up to 0xFF UTF-8 and 8859 are very close with the exceptions being the UTF-8 escapes for the longer characters.
timonoko 11 hours ago [-]
Opera AI solved the problem:

If you want to use symbols for Mars and Venus for example,they are not in range(0,0x20). They are in Miscellanous Symbols block.

10 hours ago [-]
timonoko 9 hours ago [-]
Ok this set does not even show on Android, just some boxes. Very strange.
Aardwolf 6 hours ago [-]
Imho ascii wasted over 20 of its precious 128 values on control characters nobody ever needs (except perhaps the first few years of its lifetime) and could easily have had degree symbol, pilcrow sign, paragraph symbol, forward tick and other useful symbols instead :)
ogurechny 4 hours ago [-]
Smaller, 6-bit code pages existed before and after that. They did not even have space for upper and lower case letters, but had control characters. Those codes directly moved the paper, switched to next punch card or cut the punched tape on the receiving end, so you would want them if you ever had to send more than a single line of text (or a block of data), which most users did.

Even smaller 5-bit Baudot code had already had special characters to shift between two sets and discard the previous character. Murray code, used for typewriter-based devices, introduced CR and LF, so they were quite frequently needed in way more than few years.

gpvos 3 hours ago [-]
Maybe 32 was a bit much, but even fitting a useful set of control characters into, say, 16, would be tricky for me. For example, ^S and ^Q are still useful when text is scrolling by too fast.
bee_rider 5 hours ago [-]
On top of the control symbols being useful, providing those symbols would have reduced the motivation for Unicode, right?

ASCII did us all the favor of hitting a good stopping point and leaving the “infinity” solution to the future.

zygentoma 6 hours ago [-]
I started using the separator symbols (file, group, record, unit separator, ascii 60-63 ... though mostly the last two) for CSV like data to store in a database. Not looking back!
gschizas 5 hours ago [-]
ASCII 60-63 is just <=>?

You probably mean 28-31 (∟↔▲▼, or ␜␝␞␟)

Unless this is octal notation? But 0o60-0o63 in octal is 0123

mmooss 2 hours ago [-]
I've wanted to do that but don't you have compatibility problems? What can read/import files with those deliminters? Don't people you are working with have problems?
mmooss 2 hours ago [-]
It is interesting that, as a guess, we waste an average of ~5% of storage capacity for text (12.5% of Unicode's first byte, but many languages regularly use higher bytes of course).

I don't fault the creators of ASCII - those control characters were probably needed at the time. The fault is ours for not moving on from the legacy technology. I think some non-ASCII/Unicode encodings did reuse the control character bytes. Why didn't Unicode implement that? I assume they were trying to be be compatible with some existing encodings, but couldn't they have chosen the encodings that made use of the control character code points?

If Unicode were to change it now (probably not happening, but imagine ...), what would they do with those 32 code points? We couldn't move other common characters over to them - those already have well-known, heavily used code points in Unicode and also iirc Unicode promises backward compability with prior versions.

There still are scripts and glyphs not in Unicode, but those are mostly quite rare and effectively would continue to waste the space. Is there some set of characters that would be used and be a good fit? Duplicate the most commonly used codepoints above 8 bits, as a form of compression? Duplicate combining characters? Have a contest? Make it a private area - I imagine we could do that anyway, because I doubt most systems interpret those bytes now.

Also, how much old data, which legitimately uses the ASCII control characters, would become unreadable?

y42 5 hours ago [-]
only that would have broken the whole thing back in the days ;)
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 19:13:59 GMT+0000 (Coordinated Universal Time) with Vercel.