Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲nanochat can now train GPT-2 grade LLM for –$73 (3 hours on single 8XH100 node) (twitter.com)

15 points by tosh 2 days ago | 5 comments

aitchnyu 2 days ago [-]

Is there a theoritical minimum for computing power required to say, target GPT-2? Is there something fundamental to prevent a gaming laptop from exceeding Claude Opus?

willx86 2 days ago [-]

( all of this math is approximate) https://stackoverflow.com/questions/62491720/in-latency-valu...

Bear in mind this is: - 5 years old - only cpu

If you'd do this on a gaming laptop, it'd all be on SSDs, which are orders of magnitude slower than GPU's for memory access

Also, AI uses maths, called FLOPS, floating point operations

My laptop cpu (7840U) has 4.1TFLOPS, a H200 GPU has 3,958 TFLOPS

OpenAI chatgpt 5 was reportedly trained on ~100-200k nvidia GPU's

So: - accessing data is 1000x slower - maths is 1000x slower - they have up to 200,000x more GPU's than a laptop

Now remember each part of the data is used multiple times, you start getting into the GPU's being 1000x1000x200,000x( data access multiple times) faster

So, I don't think there's fundamentally something impossible with training claude opus on your laptop, but moreso the time required would be so infinitely high that it's very improbable.

TylerLives 2 days ago [-]

You could do it by hand, by calculating the gradients and doing backprop with pen and paper.

JPLeRouzic 2 days ago [-]

I have an unrelated question: Why the URL is submission is "twitter.com" when the link leads to "x.com". Is it still possible to use twitter.com?

GuestFAUniverse 2 days ago [-]

And why becomes "~" (circa) a minus?

Luckily that error is blatantly obvious.

clawsyndicate 2 days ago [-]

[dead]

Rendered at 18:03:38 GMT+0000 (Coordinated Universal Time) with Vercel.