NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Mistral AI Releases Forge (mistral.ai)
ogou 2 hours ago [-]
Don't sleep on Mistral. Highly underrated as a general service LLM. Cheaper, too. Their emphasis on bespoke modelling over generalized megaliths will pay off. There are all kinds of specialized datasets and restricted access stores that can benefit from their approach. Especially in highly regulated EU.

Not everyone is obsessed with code generation. There is a whole world out there.

srivmo 28 minutes ago [-]
> Their emphasis on bespoke modelling over generalized megaliths will pay off.

Isn't the entire deal with LLMs that they are trained as megaliths? How can bespoke modelling overcome the treasure trove of knowledge that megaliths can generically bring in, even in bespoke scenarios?

isodev 35 minutes ago [-]
Indeed, but even for coding use cases, Vibe is more of a focused “refactor/ write this function” aid than “write me an app” and it can work locally. For me that’s a lot more valuable as an accelerator to my workflow where the developer stays in control and fully involved in the process.
haraldooo 53 minutes ago [-]
I agree. Just started using it. Can you give some examples of fields you maybe even prefer Mistral?
mark_l_watson 7 hours ago [-]
I am rooting for Mistral with their different approach: not really competing on the largest and advanced models, instead doing custom engineering for customers and generally serving the needs of EU customers.
ChrisGreenHeur 1 hours ago [-]
I found it to be the best model if you want to talk about topics philosophical. It has no problems going deep and technical while other models tend to be afraid of overshooting the comprehension of the reader.
nicman23 49 minutes ago [-]
also offering support for local deployments
jerrygoyal 4 hours ago [-]
their ocr model is goated
stavros 3 hours ago [-]
Better than Qwen? I guess the best overall is Gemini, right?
thefounder 2 hours ago [-]
Gemini is the worst
ph4rsikal 2 hours ago [-]
Gemini? Not anywhere near.
arushs 3 hours ago [-]
[dead]
w4yai 6 hours ago [-]
Go Mistral !
doctorpangloss 2 hours ago [-]
first, there was .ai

next, it sounds like it's going to be .eu

but what about ai.eu

zby 19 minutes ago [-]
I am pretty sure that the solution to continuous learning is with external storage. There is a lot of talk about context engineering - but I have not seen anyone taking context as the main bottleneck and building a system around that. This would show that even context engineering is kind of wrong term - because context does not enter the llm in some mysterious way - it goes through prompt and the whole model of passing chat history back and forth is not the most efficient way of using the prompt limitation.
Centigonal 17 minutes ago [-]
What do you mean when you say "external storage?"
upghost 4 hours ago [-]
> Pre-training allows organizations to build domain-aware models by learning from large internal datasets.

> Post-training methods allow teams to refine model behavior for specific tasks and environments.

How do you suppose this works? They say "pretraining" but I'm certain that the amount of clean data available in proper dataset format is not nearly enough to make a "foundation model". Do you suppose what they are calling "pretraining" is actually SFT and then "post-training" is ... more SFT?

There's no way they mean "start from scratch". Maybe they do something like generate a heckin bunch of synthetic data seeded from company data using one of their SOA models -- which is basically equivalent to low resolution distillation, I would imagine. Hmm.

mirekrusin 2 hours ago [-]
Probably marketing speak for full fine-tuning vs PEFT/LoRA.
stingraycharles 4 hours ago [-]
I can imagine that, as usual, you start with a few examples and then instruct an LLM to synthesize more examples out of that, and train using that. Sounds horrible, but actually works fairly well in practice.
anon373839 3 hours ago [-]
I think they are referring to “continued pretraining”.
dash2 29 minutes ago [-]
I think it’s interesting what this approach suggests about who will profit from AI. I’m sceptical that having huge numbers of GPUs is a moat. After all, real humans – even geniuses – are trained on much much less data than the whole Internet. But proprietary and specialised data could very well be a moat. It’s hard to train a scientist/lawyer/analyst without reading a lot of science/law/finance. Companies’ proprietary data might encode a great deal of irreplaceable knowledge. Seems as if Mistral is taking this bet.
roxolotl 7 hours ago [-]
Mistral has been releasing some cool stuff. Definitively behind on frontier models but they are working a different angle. Was just talking at work about how hard model training is for a small company so we’d probably never do it. But with tools like this, and the new unsloth release, training feels more in reach.
whatever1 17 minutes ago [-]
I thought that for pretraining to work and reasoning to emerge you need internet scale data. How can forge achieve it with just internal company data (unless the said company is AT&T or something) ?
speedgoose 22 minutes ago [-]
I was enthusiastic but it’s "contact us" priced for now. I was expecting a classic cloud LLM forge with a public pricing.
dmix 6 hours ago [-]
This is definitely the smart path for making $$ in AI. I noticed MongoDB is also going into this market with https://www.voyageai.com/ targeting business RAG applications and offering consulting for company-specific models.
ryeguy_24 5 hours ago [-]
How many proprietary use cases truly need pre-training or even fine-tuning as opposed to RAG approach? And at what point does it make sense to pre-train/fine tune? Curious.
mirekrusin 2 hours ago [-]
You can fine tune small, very fast and cheap to run specialized models ie. to react to logs, tool use and domain knowledge, possibly removing network llm comms altogether etc.
Shitty-kitty 3 hours ago [-]
rag basically gives the llm a bunch of documents to search thru for the answer. What it doesn't do is make the algorithm any better. pre-training and fine-tunning improve the llm abaility to reason about your task.
baby 5 hours ago [-]
RAG is dead
charcircuit 5 hours ago [-]
Using tools and skills to retrieve data or files is anything but dead.
CharlesW 5 hours ago [-]
And yet your blog says you think NFTs are alive. Curious.

But seriously, RAG/retrieval is thriving. It'll be part of the mix alongside long context, reranking, and tool-based context assembly for the forseeable future.

nl 3 hours ago [-]
I don't think RAG is dead, and I don't think NFTs have any use and think that they are completely dead.

But the OP's blog is more about ZK than about NFTs, and crypto is the only place funding work on ZK. It's kind of a devil's bargain, but I've taken crypto money to work on privacy preserving tech before and would again.

elicash 4 hours ago [-]
I have no interest in anything crypto, but they are making a proposal about NFTs tied to AI (LLMs and verifiable machine learning) so they can make ownership decisions.

So it'd be alive in the making decisions sense, not in a "the technology is thriving" sense.

strongly-typed 5 hours ago [-]
Wait, what does NFTs have to do with RAG?
panarky 5 hours ago [-]
I, for one, find NFT-shilling to be a strong signal that I should downgrade my trust in everything else a person says.
LoganDark 5 hours ago [-]
Nothing, I think they're just pointing out a seeming lack of awareness of what really is or isn't dead.
loeg 5 hours ago [-]
Is it??
bigyabai 5 hours ago [-]
In what, X's hype circles? Embeddings are used in production constantly.
wei03288 57 minutes ago [-]
The interesting positioning here is the pretraining partnership angle, not just the fine-tuning endpoint. Most model providers compete on "best foundation model" — Mistral is betting on "best model for your data", which is a fundamentally different value proposition and sidesteps the frontier race entirely.

The RL component is the part worth watching. Custom reward models trained on domain-specific preferences can get significantly better results than generic RLHF on narrow tasks, but they require the customer to have enough labeled preference data to bootstrap the reward model. That's a higher bar than fine-tuning, but also a higher moat for Mistral once it's working.

The business model makes sense too: pretraining partnerships lock in much longer relationships than inference API contracts.

csunoser 7 hours ago [-]
Huh. I initially thought this is just another finetuning end point. But apparently they are partnering up with customers on the pretraining side as well. But RL as well? Jeez RL env are really hard to get right. Best wishes I guess.
andai 6 hours ago [-]
They mention pretraining too, which surprises me. I thought that was prohibitively expensive?

It's feasible for small models but, I thought small models were not reliable for factual information?

simsla 4 hours ago [-]
Typical stages of training for these models are:

Foundational:

- Pretraining - Mid/post-training (SFT) - RLHF or alignment post-training (RL)

And sometimes...

- Some more customer-specific fine-tuning.

Note that any supervised fine-tuning following the Pretraining stage is just swapping the dataset and maybe tweaking some of the optimiser settings. Presumably they're talking about this kind of pre-RL fine-tuning instead of post-RL fine-tuning, and not about swapping out the Pretraining stage entirely.

hermit_dev 4 hours ago [-]
The future of AI is specialization, not just achieving benevolent knowledge as fast as we can at the expense of everything and everyone along the way. I appreciate and applaud this approach. I am looking into a similar product myself. Good stuff.
reverius42 1 hours ago [-]
Ironically that was also the past of AI. In 2016 it was all about specialized models (not just training data, everything including architecture and model class/type) for specific tasks and that's the way things had been for a long time.

Are you suggesting that it's an aberration that from ~2019 to ~2026 the AI field has been working on general intelligence (I assume this is what you mean by "achieving benevolent knowledge")?

Personally I think it's remarkable how much a simple transformer model can do when scaled up in size. LLMs are an incredible feat of generalization. I don't see why the trajectory should change back towards specialization now.

holoduke 1 hours ago [-]
I don't think that's true. Nothing points to specialized LLMs being better. General purpose LLMs are just much more useful in daily work.
rorylawless 6 hours ago [-]
The fine tuning endpoint is deprecated according to the API docs. Is this the replacement?

https://docs.mistral.ai/api/endpoint/deprecated/fine-tuning

aavci 5 hours ago [-]
Interesting to see. I thought they were promoting fine tuning
aavci 5 hours ago [-]
How does this compare to fine tuning?
supernes 1 hours ago [-]
> Code agents are becoming the primary users of developer tools, so we built Forge for them first, not

... for humans.

bsjshshsb 6 hours ago [-]
Id training or FT > context? Anyone have experience.

Is it possible to retrain daily or hourly as info changes?

codance 5 hours ago [-]
[dead]
shablulman 10 hours ago [-]
[dead]
gpubridge 4 hours ago [-]
[flagged]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 06:59:14 GMT+0000 (Coordinated Universal Time) with Vercel.