Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Was my $48K GPU server worth it? (rosmine.ai)

63 points by apwheele 3 days ago | 36 comments

freediddy 34 minutes ago [-]

In the last year, I have bought an M3 Ultra Mac Studio with 512 GB, a Macbook Pro M5 MAX with 128 GB and an RTX 6000 Pro. I have spent around $25k so far, not including electricity. I figured worst case scenario I can sell them in the next year and only take a haircut as opposed to losing my entire investment.

In comparison to just spending for tokens, the tokens would have been much cheaper and much much faster. I've been running against Gemma4:31b, Qwen3.5 and 3.6, and getting local LLMs to solve AMC 8/10 math questions and it's about 10-100x slower than just doing it online. When I tried it with ChatGPT late last year, it took about one night and $25 to solve about 1000 questions. Using my RTX 6000 and M3 Ultra and Gemma4:31b on both, it answered about 40 questions in 7 hours and I haven't checked how good the answer is yet. At 800 watts (600 for RTX and 200 for M3 Ultra) and running for 7 hours, it solved around 40 questions.

At the very least I'm going to try to sell my M3 Ultra if I can find a reliable place to sell it without getting ripped off by scammers.

jon-wood 31 minutes ago [-]

I’m not usually one to ask this because learning to do a thing can be fun, but why exactly have you spent 25 thousand dollars on getting an LLM someone else made to answer maths exam questions?

hnuser123456 16 minutes ago [-]

Privacy and offline operation are valuable or non-negotiable in some cases, but the difference is pretty categorical between what can run on a single card and what can run on a DGX GB200 NVL72 cabinet. Doesn't mean it's not worth seeing how far local models can be pushed. Not every problem needs a senior engineer.

freediddy 27 minutes ago [-]

It's just a project I'm working on. I'm working on projects where AIs are processing and classifying large amounts of data that would be a lot of work for humans to do.

bethekind 12 minutes ago [-]

Which of these has been the most productive for you? Sounds like you've enjoyed the RTX6000 the most?

arjie 16 minutes ago [-]

All of these have appreciated in value. How much are you looking for the Ultra?

CamperBob2 30 minutes ago [-]

How do you use the RTX 6000 with the Macs? Exo? I would think that would be pretty snappy if configured properly.

freediddy 26 minutes ago [-]

This is on a separate Windows PC, I don't have it integrated with the Macs.

datadrivenangel 14 minutes ago [-]

I did the math at least on a Macbook pro, and for inference it's definitely not worth it.

- https://www.williamangel.net/blog/2026/05/17/offline-llm-ene... - Discussion: https://news.ycombinator.com/item?id=48168198

hasteg 36 minutes ago [-]

Just curious OP (if you're the one posting) -- what do you mean by independent researcher? What are you researching and are you making $$ from it or are you living off previous built up savings? Seems like an interesting path. What research have you looked into so far?

daemonologist 22 minutes ago [-]

They have a subsequent post (from Monday) about what they've been working on: https://rosmine.ai/2026/05/18/fixing-llm-writing-with-distri...

(I would assume they haven't made a lot of $ off of this, if nothing else because they've only just put out that post and demo. They do seem to have produced a model that doesn't sound very LLM-y to my ear, though it also seems rather weak for its size.)

exceptione 26 minutes ago [-]

I am not the author, but he has been training/tuning? a model that produces text that mimics the source material in a more natural way. So getting the LLMs to produce less bland and boring LLMisms, according to the following up blog post.

hsuduebc2 23 minutes ago [-]

citing from the article:

"I spent a long time trying high risk/high reward experiments and failing. But now I have something good. I’ve solved a major problem with LLMs. And I’m launching next Monday so we will soon see if it’s actually a breakthrough or just LLM psychosis "

Maybe ai companies today have some bounty program?

jameson 22 minutes ago [-]

The idea is similar to maintaining on-prem vs cloud

Cloud is optimized for development velocity but its nature of high margin business eventually makes on-prem more promising

It could be too late but it might be worth looking into tax saving if you have a business. Depreciation of asset is a loss and may deduct your income. (I'm NOT a tax expert)

6 minutes ago [-]

0xbadcafebee 23 minutes ago [-]

So the answer is: "TBD if I can actually make money to pay this back"

Aurornis 32 minutes ago [-]

This is a difficult calculation to make because you wouldn't rent time on the exact same system in the cloud. Depending on what you're running, a bigger server with better inter-GPU interconnects in the cloud might complete the task so much faster that the additional per-hour expense is more than covered.

tombert 38 minutes ago [-]

I have four old 24gb Nvidia cards. They're not great but they're not useless either. The problem is that I haven't really figured out a good way to actually use them.

Genuine question; would anyone here recommend any specific motherboard to best utilize these cards?

mciancia 31 minutes ago [-]

Depends what you want to do and which cards you have, but usually going with any older (3rd gen+) threadripper pro setup will give you a lot of pcie lanes.

I myself run with gigabyte trx40 aorus xtreme, but since it's regular threadripper (not pro) with 4 GPUs 2 of them will run at x16 and two of them at x8 speeds

jmyeet 26 minutes ago [-]

So some things have changed since this rig was first built (2024). The most relevant is that $6800 RTX 6000 Ada 48GB has arguably been supplanted by the $9500 RTX 6000 Pro 96GB.

The Ada has a memory bandwidth of 960GB/s. The Pro has 1.8TB/s and about 40-50% better performance so is at least equivalent in processing power, much better in memory bandwidth (important for inference) and can hold larger models on a single card.

I've considered buying a rig with 1-2 6000 Pros for similar reasons but I want to see what happens with this year's Mac Studios with a likely M5 Ultra. Macs have a shared memory architecture whereas NVidia segments the market based on max memory where the biggest consumer card (RTX 5090) has 32GB of VRAM but still excellent memory bandwidth (1.8TB/s). A RTX 5090 rig will still trounce a Mac Studio seems to be the conventional wisdom. Despite being able to hold larger models and being able to chain Mac Studios on TB5, their lower memory bandwidth (~900GB/s) and lower overall GFLOPS mean they still come out behind.

That being said, the current Mac Studios are relatively long in the tooth, being released in 2024.

I'm still not sure any of this is really wroth it because things are still changing so fast. I think there's a decent chance of a number of large AI companies going bust in the next 2-3 years such that you'll be able to buy enterprise AI hardware at cents on the dollar, a bit like how Google bought data centers in the post-dot-com crash.

But anyway, nowadays I'd be looking at the RTX 6000 Pro as the sweet spot, having anywhere from 1-4 in a single server.

The electricial issues the author mentions are interesting. I hadn't really thought about the max amperage on a residential circuit. In a DC, these would typically operate on three phase power and much higher overall amperage. I wonder if there's a device you can buy that can combine multiple residential circuits into a single power source for a server this power hungry?

freediddy 21 minutes ago [-]

I have the Macbook M5 MAX with 128 GB of RAM. I put its performance at roughly equivalent to the RTX 5070 Ti. The M3 Ultra 512 GB for me is about half the performance of the RTX 5070 Ti but obviously it has the ability to do more because of the increased memory.

I don't think anything compares to the nVidia chips at all.

nextos 20 minutes ago [-]

I am also considering to buy 3-4x RTX 6000 Pro 96GB plus some Ryzen workstation with a grant.

Is this the best general-purpose choice as of 2026 with $50k for training, fine-tuning and running large open models?

trevithick 19 minutes ago [-]

You would install a 240v circuit (in the US) like for an electric clothes dryer.

Edit: I now see the author was in an apartment and couldn't do this, so I concede this is not responsive here.

doctorpangloss 51 minutes ago [-]

> Because of this I got a motherboard with slow GPU interconnect. It’s good for running many small experiments in parallel (which is my main use case) but horrible for any models split across gpus.

:( you paid a professional pc builder and you weren't told this?

mciancia 35 minutes ago [-]

I wonder why using 2 PSUs resulted in having slower interconnect.

There is no specs in this blogpost regarding cpu/motherboard choice, but if you go with threadripper pro they have 128 pci-e lanes for some time now, so using all GPUs at full speed shouldn't be a problem

zozbot234 25 minutes ago [-]

If you split models using pipeline/layer parallelism you don't have to care about a slow interconnect, you're just slowed down a lot when running a single inference at a time as opposed to a fully pipelined minibatch. But tensor parallelism requires much faster interconnects than you could get in your average server, so I'm not sure that a different motherboard would help all that much.

m-hodges 27 minutes ago [-]

what is a "professional pc builder" in 2026

ok_dad 25 minutes ago [-]

A guy on Facebook with more confidence and better insurance

ginko 47 minutes ago [-]

Don't those Ada 6000 GPUs support NVLink? I think I can even see the cover for the connectors in OP's pic.

edit: Hm, finding mixed information online on whether that's still supported or not. Apparently it was removed in workstation GPUs.

mciancia 41 minutes ago [-]

Nope, they don't support it. And afair even if they did, you would be limited to connecting only in pairs, not all 6 together

CamperBob2 41 minutes ago [-]

Consumer motherboards can still make sense even if you leave some performance on the table. Running an actual 8x GPU server is not something you'd want to do in an apartment. Imagine the old Lucasfilm "THX" trailer where an unearthly-sounding foghorn whine rises to a sweeping crescendo at reference level, only without the decay at the end.

At the time he put this rig together, there weren't a lot of open-weight LLMs that could run well on 6x48=288 GB, so it probably wasn't a huge loss. There still aren't, really.

Right now I'm in the process of cramming Blackwell cards into an old DDR4-based Milan server, where the important thing is to be able to run large models at all. The GPU fans alone burn over 400 watts at full throttle.

storus 20 minutes ago [-]

Did you think about Max-Q cards? 300W and they aren't that noisy either, 14% lower perf than non-Max-Q card.

CamperBob2 15 minutes ago [-]

That was an option, but having decided on a true server chassis for other reasons, it made sense to use server-edition cards to take advantage of all those fans. I downclock them to 300W anyway for longevity, but it's nice to have the option to go to 600W if needed.

The server is going to live in the garage anyway, so I'm not that concerned with noise. But I had no idea what to expect when I flipped the switch for the first time. It sounds like something out of the Book of Revelation. No way, no how could something like this be used in an inhabited area.

thecatmak 32 minutes ago [-]

[dead]

pelasaco 14 minutes ago [-]

out of curiosity, did you check how much would cost to rent a cage in a colocation space? Having to power your computer from two different outlets sounds wild..

gosub100 39 minutes ago [-]

It doesn't cover risk. If one or more gpus dies, who pays for it? If you rent, you are guaranteed to be insulated from this risk. But owning, you might not have the best return policy from the vendor. And if you are actually at fault for breaking it, they have every right to deny a return. Or if your apartment is burglarized or catches fire (possibly from overloading the circuit) you are out the entire investment.

0xbadcafebee 21 minutes ago [-]

Also a lightning strike or surge from the electric utility could fry the whole rig. Proper protection costs thousands, and even then it's not guaranteed to protect everything

benjiro3000 10 minutes ago [-]

[dead]

Rendered at 18:36:11 GMT+0000 (Coordinated Universal Time) with Vercel.