I really hope this doesn't hinder development too much. As Simon says, Qwen3.5 is very impressive.
I've been testing Qwen3.5-35B-A3B over the past couple of days and it's a very impressive model. It's the most capable agentic coding model I've tested at that size by far. I've had it writing Rust and Elixir via the Pi harness and found that it's very capable of handling well defined tasks with minimal steering from me. I tell it to write tests and it writes sane ones ensuring they pass without cheating. It handles the loop of responding to test and compiler errors while pushing towards its goal very well.
misnome 3 hours ago [-]
I've been playing with 3.5:122b on a GH200 the past few days for rust/react/ts, and while it's clearly sub-Sonnet, with tight descriptions it can get small-medium tasks done OK - as well as Sonnet if the scope is small.
The main quirk I've found is that it has a tendency to decide halfway through following my detailed instructions that it would be "simpler" to just... not do what I asked, and I find it has stripped all the preliminary support infrastructure for the new feature out of the code.
sheepscreek 1 hours ago [-]
That sounds awfully similar to what Opus 4.6 does on my tasks sometimes.
> Blah blah blah (second guesses its own reasoning half a dozen times then goes). Actually, it would be a simpler to just ...
Specifically on Antigravity, I've noticed it doing that trying to "save time" to stay within some artificial deadline.
It might have something to do with the system messages and the reinforcement/realignment messages that are interwoven into the context (but never displayed to end-users) to keep the agents on task.
wood_spirit 52 minutes ago [-]
Yeah that happened to me with Claude code opus 4.6 1M for the first time today. I had to check the model hadn’t changed. It was weird. I was imagining that maybe anthropic have a way of deciding how much resource a user actually gets and they had downgraded me suddenly or something.
e1g 11 minutes ago [-]
Claude Code recently downgraded the default thinking level to “medium”, so it’s worth checking your settings.
shaan7 1 hours ago [-]
> that it would be "simpler" to just... not do what I asked
That sounds too close to what I feel on some days xD
reactordev 2 hours ago [-]
Turn down the temperature and you’ll see less “simpler” short cuts.
smokel 48 minutes ago [-]
For the uninitiated: Interestingly, it is not advisable to take this to the extreme and set temperature to 0.
That would seem logical, as the results are then completely deterministic, but it turns out that a suboptimal token may result in a better answer in the long run. Also, allowing for a little bit of noise gives the model room to talk itself out of a suboptimal path.
LoganDark 21 minutes ago [-]
I like to think of this like tempering the output space. With a temperature of zero, there is only one possible output and it may be completely wrong. With even a low temperature, you drastically increase the chances that the output space contains a correct answer, through containing multiple responses rather than only one.
I wonder if determinism will be less harmful to diffusion models because they perform multiple iterations over the response rather than having only a single shot at each position that lacks lookahead. I'm looking forward to finding out and have been playing with a diffusion model locally for a few days.
Twirrim 4 hours ago [-]
I've been testing the same with some rust, and it's has spent a fair bit of time going through an infinite seeming loop before finally unjamming itself. It seems a little more likely to jam up than some other models I've experimented with.
It's also driving itself crazy with deadpool & deadpool-r2d2 that it chose during planning phase.
That said, it does seem to be doing a very good job in general, the code it has created is mostly sane other than this fuss over the database layer, which I suspect I'll have to intervene on. It's certainly doing a better job than other models I'm able to self-host so far.
Aurornis 3 hours ago [-]
> it's has spent a fair bit of time going through an infinite seeming loop before finally unjamming itself.
I think this is part of the model’s success. It’s cheap enough that we’re all willing to let it run for extremely long times. It takes advantage of that by being tenacious. In my experience it will just keep trying things relentlessly until eventually something works.
The downside is that it’s more likely to arrive at a solution that solves the problem I asked but does it in a terribly hacky way. It reminds me of some of the junior devs I’ve worked with who trial and error their way into tests passing.
I frequently have to reset it and start it over with extra guidance. It’s not going to be touching any of my serious projects for these reasons but it’s fun to play with on the side.
sosodev 4 hours ago [-]
Some of the early quants had issues with tool calling and looping. So you might want to check that you're running the latest version / recommended settings.
cbm-vic-20 1 hours ago [-]
I don't know much about how these models are trained, but is this behavior intentional (ie, the people pulling the levers knew that this is how it would end up), or is it emergent (ie, pulling the levers to see what happens)?
misnome 2 hours ago [-]
> and it's has spent a fair bit of time going through an infinite seeming loop before finally unjamming itself
I can live with this on my own hardware. Where Opus4.6 has developed this tendency to where it will happily chew through the entire 5-hour allowance on the first instruction going in endless circles. I’ve stopped using it for anything except the extreme planning now.
anana_ 52 minutes ago [-]
I've had even better results using the dense 27B model -- less looping and churning on problems
abhikul0 3 hours ago [-]
Are you running it locally with llama.cpp? If so, is it working without any tweaking of the chat template? The tool calls fail for me when using the default chat template, however it seems to work a whole lot better with this: https://huggingface.co/Qwen/Qwen3.5-35B-A3B/discussions/9#69...
arcanemachiner 3 hours ago [-]
Have you tried the '--jinja' flag in llama-server?
abhikul0 2 hours ago [-]
Yes, it fails too. I’m using the unsloth q4_km quant. Similarly fails with devstral2 small too, fixed that by using a similar template i found for it. Maybe it’s the quants that are broken, need to redownload I guess.
nu11ptr 3 hours ago [-]
What hardware do you have it running on? Do you feel you could replace the frontier models with it for everyday coding? Would/will you?
politelemon 2 hours ago [-]
60 to 70 on a 5080, but only tinkering for now. The smaller models seem exceptionally good for what they are, and some can even do OCR reliably.
bigyabai 3 hours ago [-]
I'm getting ~30 tok/s on the A3B model with my 3070 Ti and 32k context.
> Do you feel you could replace the frontier models with it for everyday coding? Would/will you?
Probably not yet, but it's really good at composing shell commands. For scripting or one-liner generation, the A3B is really good. The web development skills are markedly better than Qwen's prior models in this parameter range, too.
58 minutes ago [-]
paoliniluis 4 hours ago [-]
what's your take between Qwen3.5-35B-A3B and Qwen3-Coder-Next?
sosodev 4 hours ago [-]
In my experience Qwen3.5 is better even at smaller distillations. From what I understand the Qwen3-next series of models was just a test/preview of the architectural changes underpinning Qwen3.5. So Qwen3.5 is a more complete and well trained version of those models.
kamranjon 4 hours ago [-]
In my experience qwen 3 coder next is better. I ran quite a few tests yesterday and it was much better at utilizing tool calls properly and understanding complex code. For its size though 3.5 35B was very impressive. coder next is an 80b model so i think its just a size thing - also for whatever reason coder next is faster on my machine. Only model that is competitive in speed is GLM 4.7 flash
xrd 4 hours ago [-]
What do you use as the orchestrator? By this I mean opencode, or the like. Is that the right term?
It's really easy to setup with any OpenAI compatible API and I self host Qwen Coder 3 Next on my personal MBP using LM Studio and just dial in from my work laptop with Zed and tailscale so i can connect from wherever i might be. It's able to do all sorts of things like run linting checks and tests and look for issues and refactor code and create files and things like this. I'm definitely still learning, but it's a pretty exciting jump from just talking to a chat bot and copying and pasting things manually.
simonw 4 hours ago [-]
I use the term "harness" for those - or just "coding agent". I think orchestrator is more appropriate for systems that try to coordinate multiple agents running at the same time.
This terminology is still very much undefined though, so my version may not be the winning definition.
karmakaze 3 hours ago [-]
We don't have a Qwen3.5-Coder to compare with, but there is a chart comparing Qwen3.5 to Qwen3 including Qwen3-Next[0].
It's the number of active parameters for a Mixture of Experts (misleading name IMO) model.
Qwen3.5-35B-A3B means that the model itself consists of 35 billion floating point numbers - very roughly 35GB of data - which are all loaded into memory at once.
But... on any given pass through the model weights only 3 billion of those parameters are "active" aka have matrix arithmetic applied against them.
This speeds up inference considerably because the computer has to do less operations for each token that is processed. It still needs the full amount of memory though as the 3B active it uses are likely different on every iteration.
zozbot234 1 hours ago [-]
It will benefit from a full amount of memory for sure, but AIUI if you use system memory and mmap for your experts you can execute the model with only enough memory for the active parameters, it's just unbearably slow since it has to swap in new experts for every token. So the more memory you have in excess to that, the more inactive but often-used experts can be kept in RAM for better performance.
hintymad 3 hours ago [-]
There has been tension between Qwen's research team and Alibaba's product team, say the Qwen App. And recently, Alibaba tried to impose DAU as a KPI. It's understandable that a company like Alibaba would force a change of product strategy for any number of reasons. What puzzled me is why they would push out the key members of their research team. Didn't the industry have a shortage of model researchers and builders?
cmrdporcupine 2 hours ago [-]
Perhaps they wanted future Qwen models to be closed and proprietary, and the authors couldn't abide by that.
lzaborowski 49 minutes ago [-]
One thing I’ve noticed with local models is that people tolerate a lot more trial and error behavior. When a hosted model wastes tokens it feels expensive, but when a local model loops a bit it just feels like it’s “thinking.”
If models like Qwen can get good enough for coding tasks locally, the real shift might be economic rather than purely capability.
softwaredoug 4 hours ago [-]
I wonder how a US lab hasn't dumped truckloads of cash into various laps to ensure these researchers have a place at their lab
gaoshan 3 hours ago [-]
ICE has been detaining Chinese people in my area (and going door to door in at least one neighborhood where a lot of Chinese and Indians live). I was hearing about this just last week as word spread amongst the Chinese community here (Ohio) to make sure you have some legal documentation beyond just your driver's license on you at all times for protection. People will hear about this through the grapevine and it has a massive (and rightly so) chilling effect. US labs can try but with US government behaving like it is I don't think they will have much luck.
*edit: not that it matters, but since MAGA can't help but assume, these are all US citizens and green card holders that I am referring to.
bobthepanda 3 hours ago [-]
Yeah, the Hyundai factory fiasco kind of dashed the idea that the enforcement would spare people working in favored industries setting up in the US.
genxy 1 hours ago [-]
The Hyundai factory "enforcement" wasn't even legal. Those workers were here to train US workers and the Hyundai employees had proper visas for this.
The regime is powered by racism and doesn't think through things.
ljsprague 3 hours ago [-]
Are the people being detained in the country illegally?
Jcampuzano2 2 hours ago [-]
The reality is - it doesn't matter. The fact that they have had as many false positives as they have and the way they treat people in general causes it to have rippling effects even for people who are legally here, or are considering legally immigrating.
The risk and level of publicity is just too high for many people to even consider, especially people already intelligent/capable enough to immigrate anywhere else that doesn't have these issues or stay in their own country.
0x3f 3 minutes ago [-]
Have they had a lot of false positives? Almost every story I see seems to fall apart on further investigation. To be clear, I'm sure they have some false positives, but do they have a lot of them relative to any other immigration system?
4RealFreedom 6 minutes ago [-]
"especially people already intelligent/capable enough to immigrate anywhere else that doesn't have these issues or stay in their own country" Isn't that the point? Come here legally or don't come at all.
gaoshan 2 hours ago [-]
No, all of the specific cases I heard about were Chinese people that were naturalized citizens (some for decades) who were cuffed and detained for a few hours before being released. As others have said it doesn't really matter, though. It's the sentiment that counts.
Conscat 1 hours ago [-]
Even if you're not likely to be deported from a foreign country, you wouldn't want to face frequent gang intimidation tactics, would you? Simply feeling threatened isn't fun, even if nothing truly terrible will happen to you (not to speak of the real risk in being detained regardless).
mattnewton 3 hours ago [-]
Sometimes, often times no. They have detained multiple US citizens.
misnome 2 hours ago [-]
Who cares when you get a bonus per person either way?
sourcegrift 3 hours ago [-]
Yes. Yes, so true. And the phd types building these models are probably even scared in China that ICE will fly there to deport them.
jwolfe 3 hours ago [-]
This thread is about bringing these people to the US.
velcrovan 4 hours ago [-]
What the US has done is dumped truckloads of cash to make it likely that as a legal immigrant you will be abducted and sent to a camp.
4 hours ago [-]
riddlemethat 3 hours ago [-]
This is FUD. The US has dumped truckloads of cash to make it likely that masked men with no cameras and little training will parade around abducting anyone they even suspect of being an illegal immigrant, after even Yale admitted it's likely that more than 22M+ people came here illegally. https://insights.som.yale.edu/insights/yale-study-finds-twic...
It'd be good if Congress could do something to remove the masks, put cameras on these agents, and for the local governments to stop fighting removal of all people who are here illegally so we can pretend we have borders again.
mattnewton 3 hours ago [-]
I feel like we would disagree on the role of immigration in the US but I really appreciate you calling out how the current administration’s approach is only effective at making viral clips online. Meta comment, but it’s refreshing to talk with people who have different goals while still referencing a shared reality. Removing the masks and adding cameras shouldn’t be controversial unless your goal really is to make a paramilitary force for the president.
WarmWash 1 hours ago [-]
Making viral clips is exactly what they want.
Their goal is for every one person violently detained, 10 decide to leave on their own, and 100 decide to not come in the first place.
KerrAvon 32 minutes ago [-]
"Goal" implies there's a plan instead of just wanton cruelty for the sake of cruelty.
cmrdporcupine 3 hours ago [-]
The unstated but obvious (to me?) goal of what ICE is doing is not to get large numbers of people out of the country, but to drive costs down for migrant labor by further disenfranchising them, making them scared, marginal, etc.
If they actually thoroughly evicted non-status migrant workers they'd have a outright revolt on their hands from farmers and other businesses that depend on them.
Instead those businesses can now take further advantage of the fear of harassment and/or deportation to drive down compensation and rights.
Contrast with countries like Canada that have a legal temporary foreign agriculture worker program that provides a regulated source of seasonal migrant farm worker labour under a non-citizen temporary status, but with some rights (still often abused). It's notable to me as a Canadian that I don't see this being advocated on any large scale by either party in the US.
Anyways, all this just to say that the jackboot clown theater is the point, not a side effect.
plorkyeran 2 hours ago [-]
Limiting the supply of migrant labor drives costs up, not down, and the ICE raids have had a significant negative effect on businesses reliant on illegal immigrants.
cmrdporcupine 2 hours ago [-]
Do you have numbers on how many migrant farm workers have actually been deported or detained?
Because going around and harassing and deporting other or non-essential non-status immigrants would drive labor costs down because of the chill it would put through those who are grudgingly tolerated.
And besides, given the quality of personality ICE seems to be employing even (especially) at its highest levels, I simply assume there's corruption such that if I'm a large orchard or whatever I simply pay ICE to stay away.
"There was a significant drop-off in entries to the United States in 2025 relative to 2024 and an increase in enforcement activity leading to removals and voluntary departures. We estimate that net migration was between –10,000 and –295,000 in 2025, the first time in at least half a century it has been negative."
reactordev 2 hours ago [-]
They do have a revolt on their hands from farmers… go watch some of their pleas for help.
cmrdporcupine 2 hours ago [-]
It's nothing like it would be if ICE was actually doing substantially more than fascist theater.
There'd be no food on the tables, frankly. And people in Silicon Valley would have messy houses and algae in their pools.
reactordev 2 hours ago [-]
Honestly think it’s just a matter of resources and they would rather play theater for their leader than actually do the job. However, the effect has been felt.
Soybean farmers are screwed.
gordonhart 3 hours ago [-]
Surely you know that this is an extreme misrepresentation? There are >35 million legal immigrants in the US. It's far from "likely" that as one of them you're abducted and sent to a camp.
Conscat 1 hours ago [-]
It's not merely a matter of detainment or deportation. Racial minorities, not just immigrants, face intimidation tactics. These guys are walking into schools, they're walking into social security offices, and courthouses. They stand around menacingly just to scare people. They harass random passerby's on the street, or in the grocery store. You would feel unsafe and stressed if this happened to you, no matter your circumstances.
marci 3 hours ago [-]
Unfortunately, the most extreme is that it's the new normal that now, there's >0 chance that someone, whether they are a US citizen or not apparently, child or adult, can end up in a camp, with no due process.
autoexec 3 hours ago [-]
Give them time, they've only just started. They do waste a lot time abducting random US citizens though.
halJordan 3 hours ago [-]
I think it would be a useful exercise to look at all the revocations of legal access in the us, and then do the division to see how we've increased the likelihood of becoming an illegal, and therefore targeted.
I dont think youre as right as you want to believe. Certainly not as right as I want you to believe
mft_ 4 hours ago [-]
Indeed; or, Europe badly needs a competitive model to hedge against US political nonsense.
mijoharas 16 minutes ago [-]
It'd be great if they went to Mistral!
ivan_gammel 3 hours ago [-]
Offering „You are welcome“ relocation package to Anthropic might be a good idea.
Imustaskforhelp 3 hours ago [-]
Given how American govt. has treated Anthropic, I think you might be right. EU truly has a remarkable opportunity to make Anthropic/Claude European.
petcat 2 hours ago [-]
This US administration (or any admin) would almost certainly impose export controls on US AI technology before it would allow one of the frontier model providers to be acquired/relocate outside the US. It did the same thing when ASML wanted to acquire Cymer (California company that provides the EUV light source technology). The acquisition was only allowed under strict technology sharing/export agreements with the Dutch government.
Europe really just needs to rally behind Mistral. That's where they should dump their cash.
fc417fc802 13 minutes ago [-]
Can they actually prevent it though? In typical cases there would be IP licenses involved. But in this case it's a valuation based (AFAICT) on a team of people plus their infra. What happens if they all just happened to get hired by "AnthropicEU GmbH" a new entity which has been gifted hundreds of millions in computing resources?
ivan_gammel 1 hours ago [-]
Having one „champion“ is flawed European approach. We need local competition and headhunting to make it fly.
azinman2 41 minutes ago [-]
Hard to compete in an environment that’s anti-996 and the pay is so much less.
cmrdporcupine 2 hours ago [-]
Anthropic has gone out of their way to make a point about how much they love and admire the US state and its defense sector. Only drawing the line at a very far point and even when they drew the line it was with a big thing about how they believe in the American defense sector blah blah blah.
In any case, there's no way Anthropic's investors in Silicon Valley would countenance such a move.
Also, I'm biased the logical place is Canada, not Europe. Much of the fundamental/foundational research on LLMs, and a large part of the talent, came from universities in Canada anyways.
tiahura 3 hours ago [-]
Competitive models are illegal in the EU.
ecshafer 4 hours ago [-]
China is also giving them dump trucks full of cash though. Plus you have to content with the nationalism reason (unfortunately this has died off in America for too many). The idea of building your country is valued for most Chinese I have met. Plus China is incredibly nice to live in, especially if you have lots of money and/or connections. So you can work in China, get paid lots of money, feel like you are doing good. Or In America you can get paid lots of money, and get yelled at by people online because the Government wants to use your model.
danny_codes 3 hours ago [-]
China city life is amazingly convenient. Trains and subways are just such an enormous quality of life boost. Add to that the relative cleanliness of having nearly zero homelessness and you’ve got something very compelling.
I will say we are winning in accessibility. China doesn’t have much of a ramp game
softwaredoug 3 hours ago [-]
All very true.
I wonder if you max out your options in China. It seems the Party is suspicious of ambition and high profile winners. I'm sure you can live comfortably, but there's a ceiling.
bdangubic 1 hours ago [-]
what is the issue with having a ceiling?
WarmWash 1 hours ago [-]
Star athletes really hate being told they can't score more than 10 goals in a season because it's unfair to the other weaker players. The players will either leave to go play somewhere else, or they become weaker players themselves.
1024core 3 hours ago [-]
I got an offer out of the blue for a consulting gig in ML, offering USD 400/hr in China. Assuming this was legit (the offeror seemed legit), it looks like China is also throwing a lot of Benjamins around...
maxglute 34 minutes ago [-]
> get yelled at by people online because the Government wants to use your model
Well duh, as recently demonstrated, an US model used by the US gov will 100% end up murdering actual children sooner than later, in this case less than a calendar year in some far flung war that many Americans do not support. Alternatively PRC model used by CCP might kill in some hypothetical future but for national reunification/rejuvenation that many Chinese support. At the end of the day, researchers and population on one side sleeps more soundly.
petcat 3 hours ago [-]
> Or In America you can get paid lots of money, and get yelled at by people online because the Government wants to use your model.
Isn't it just straight-up illegal in China to refuse the government from using your model? USA isn't perfect, but at least it has active discourse.
neves 2 hours ago [-]
At least it has been decades since China Gov bombed innocent people in other countries. A peaceful and responsible government.
WarmWash 51 minutes ago [-]
What's ironic is that China is desperately trying to be that country, but the US has then in a geographic/geopolitical choke hold.
petcat 2 hours ago [-]
> A peaceful and responsible government.
People in Hong Kong died. Over 10,000 were arrested and many are still in prison. The rest are permanently disgraced in their social-credit society.
Again, USA is not perfect, but let's not dream up some fantasy about the CCP.
cyberax 1 hours ago [-]
This "social credit" thing is dead in China.
petcat 52 minutes ago [-]
As an American, I have no fear of calling the US President a pedo or saying Fuck the Police on my Twitter. Not the case in China. It's horrifying.
Oh, China absolutely does not tolerate _public_ dissent very much including highly visible social media posts. Everybody there knows that.
But this:
> According to the social credit system, Chinese citizens are punishable if they indulge in buying too many video games, buying too much junk food, having a friend online who has a low credit score, visiting unauthorized websites, posting “fake news” online, and more.
...is just pure bullshit. There were _ideas_ about including these kinds of stuff into the score, but they have never been implemented. At this point, the social credit score is only used to find people who dodge court decisions.
ecshafer 2 hours ago [-]
I would imagine if it isn't illegal its a very bad idea not to. But regardless, I would bet large amounts of money that you would never get any flack for doing anything for the government. If I went on X, Threads, Bluesky, TikTok and said "Hey I am a software engineer selling awesome new technology to the government and military!" I am going to get Americans attacking me for supporting Trump / ICE / FBI whatever the current issue of the day is. If I did the same on Douyin or Weibo the response would be able making China strong, and there would be no criticism of that choice.
cmrdporcupine 2 hours ago [-]
Sure, but the difference is that while the Chinese state is measurably awful on all sorts of human rights things within their own borders... they're not currently dropping bombs on foreign cities, starving a neighbour of critical petroleum shipments, or heavily funding an ally to slowly exterminate a population.
leptons 1 hours ago [-]
Chinese people are very racist towards non-Chinese. It might seem like a happy utopia, but if you aren't Chinese, then you may not really enjoy your time there. It may not be quite as bad as being black in rural US south, but being black (or anything non-Chinese) in China is still not going to be a good time.
WarmWash 54 minutes ago [-]
Racism in even the worse parts of America doesn't even begin to touch the racism present in monocultural/monoracial countries.
Larrikin 11 minutes ago [-]
Have you experienced racism? In Japan atleast, it was evenly applied. That company won't rent to foreigners but this one will. That company won't hire foreigners but this one will. Police will bother you if you ride a bike, but they will be polite while they waste 10 minutes of your time asking for your gaijin card for biking while foreign.
In the US people try to hide it and are far more sinister about it, since there are a lot of laws against obvious racism. The cops are also happy in the US to just kill you.
The racism in the US comes out of hate where as what I experienced abroad was more, we don't think you'll fit in and follow the rules and you have to constantly prove that you can.
I didn't spend too much time in China so maybe it is a racist hell hole.
But my experience in Japan was that white immigrants were way more inclined to make a huge deal about the lighter racism they experienced because they had never been somewhere where their skin color was a disadvantage.
px43 59 minutes ago [-]
Wild to call 1.42 billion people racist despite having met very few of them.
VWWHFSfQ 2 hours ago [-]
> China is incredibly nice to live in
I'm sure it's a very nice place to live if you're content to just stay quiet in society and never put a political sign in your yard or even just talk about the wrong thing with your friend in a WeChat.
cyberax 53 minutes ago [-]
This is an exaggeration. Nobody in China cares about what you speak with each other privately, and people talk about stupid policies all the time. The government cares about _public_ actions.
In practical terms, if you're not kind of person who would want to run for an office in the US, China is incredibly comfortable. Cities are safe, with barely any violent crime. Public drug use is nonexistent. And with the US-level AI researcher income, you'd be in the top 0.1% earners.
petcat 40 minutes ago [-]
> nobody in China cares about what you speak with each other privately, and people talk about stupid policies all the time. The government cares about _public_ actions.
My comment and the linked video says otherwise. The guy was in a private group chat and said some nasty things about the police for confiscating his motorcycle. Now he's arrested and in the Tiger Chair.
How are we explaining this?
maxglute 4 minutes ago [-]
Group with 75 people. That's a crowd, doesn't matter if gated behind QR code invites. Shit talk cops and gov with the bois is fine. Shit talk / soapbox in a crowd (virtual or real) and get caught or reported = drink tea on the menu.
bdangubic 1 hours ago [-]
try to protest in america and see how that works out for you long-term. or say protest against genocide in gaza at an uni or generally in public…
cyberax 52 minutes ago [-]
Sigh. Let's not invent things? You can protest anything in the US just fine, with generally no consequences. Heck, our local _high_ _school_ students go out and protest everything to weasel out of classes.
cheema33 29 minutes ago [-]
Trump admin did put people in prison and then deported them, for doing nothing more than protesting.
Not as bad as China sure, but not as good as other civilized nations.
jamespo 3 hours ago [-]
Damn that social conscience, huh?
mmaunder 2 hours ago [-]
Yeah that was my first thought is it’s a tit for tat poach. They got the Gemini researcher so google responded in kind.
lynndotpy 2 hours ago [-]
Well, the problem aren't just the NSF funding cuts. Everyone else is already dumping truckloads of cash. There's also the public health situation (who wants measles or polio?), the risk of retaliatory attacks from the countries we're at war with, etc. You could write paragraphs about why the US is less attractive to researchers.
When I was a deep learning PhD in the first Trump administration, US universities were already very deeply affected by the Muslim ban, and so a lot of talent ended up in other countries.
Sibling commentators are rightfully pointing out that foreigners, especially those who would not be recognized as white, face an onerous and risky customs process with long-term and increasing risks of deportation. When you see a headline like the NIST labs abruptly restricting foreign scientists, _everything_ else feels uncertain. Even if someone doesn't believe they're personally at risk for deportation, they're still seeing everything else.
And then it all boils down to a reputational thing. The era where we were the top choice for research is in the past. If you start a PhD in the US on your resume during this era, you might be anticipating how you'll answe the question of why you weren't good enough to get accepted somewhere better.
bilbo0s 4 hours ago [-]
They probably have tried, but you have to have more cash than those researchers feel they can get starting their own lab. When you consider the fact that their new startup lab would have the entire nation of China as, in effect, a captive market; you start to see how almost any amount of money would be too little to convince them not to make a run at that new startup. If money is their aim.
I think Alibaba needs to just give these guys a blank check. Let them fill it in themselves. Absent that, I'm pretty sure they'll make their own startup.
I do think it'd be a big loss for the rest of the world though if they close whatever model their startup comes up with.
simgt 3 hours ago [-]
> I do think it'd be a big loss for the rest of the world though if they close whatever model their startup comes up with.
That's very likely to happen once the gap with OpenAI/Anthropic has been closed and they managed to pop the bubble.
bobthepanda 3 hours ago [-]
I don’t know, the EV bubble deflated and Chinese firms are still pumping them out with subsidies like their life depends on it.
skeeter2020 4 hours ago [-]
Getting a bit of whiplash goin from AI is replacing people, to AI is dead without (these specific) people. Surely we're far enough ahead that AI can take it from here?
If AI could effectively replace people, you wouldn’t need CEOs to keep trying to convince people.
OsrsNeedsf2P 1 hours ago [-]
That's 99% is two nines?
kylemaxwell 42 minutes ago [-]
Everything on that page has two nines, so not sure what you're trying to say here.
relaxing 15 minutes ago [-]
Right now everything on that page is 98 point something, so it must be fluctuating.
mungoman2 2 hours ago [-]
Not sure what the uptime is meant to signal. People have quite low uptime as well…
jug 1 hours ago [-]
Huh? Servers aren't people and thus have completely different expectations, or what am I missing here
px43 56 minutes ago [-]
9% uptime?
vidarh 4 hours ago [-]
Who is suggesting "AI is dead without (these specific) people"? People are wondering what it means specifically for the Qwen model family.
mhitza 4 hours ago [-]
We've gone from AGI goals to short-term thinking via Ads. That puts things better in perspective, I think.
dude250711 3 hours ago [-]
Claude is incapable of producing a native application for itself, and is bad enough with web ones to justify Anthropic acquiring Bun.
airstrike 4 hours ago [-]
I'm hopeful they will pick up their work elsewhere and continue on this great fight for competitive open weight models.
To be honest, it's sort of what I expected governments to be funding right now, but I suppose Chinese companies are a close second.
3 hours ago [-]
w10-1 34 minutes ago [-]
It sounds like the lead was demoted to attract new talent, quit as a result, and the rest of the team also resigned to force management to change their minds.
If so, I'm happy that the team held together, and I hope that endogenous tech leads get to control their own career and tech destiny after hard work leads to great products. (It's almost as inspiring as tank man, and the tank commanders who tried to avoid harming him...)
(ducking the downvote for challenging the primacy of equity...)
zoba 4 hours ago [-]
I tried the new qwen model in Codex CLI and in Roo Code and I found it to be pretty bad. For instance I told it I wanted a new vite app and it just started writing all the files from scratch (which didn’t work) rather than using the vite CLI tool.
Is there a better agentic coding harness people are using for these models? Based on my experience I can definitely believe the claims that these models are overfit to Evals and not broadly capable.
Tepix 28 minutes ago [-]
What is "the new qwen model"? There are a dozen and you can get them in a dozen different quantizations (or more) which are of different quality each.
sosodev 4 hours ago [-]
I've noticed that open weight models tend to hesitate to use tools or commands unless they appeared often in the training or you tell them very explicitly to do so in your AGENTS.md or prompt.
They also struggle at translating very broad requirements to a set of steps that I find acceptable. Planning helps a lot.
Regarding the harness, I have no idea how much they differ but I seem to have more luck with https://pi.dev than OpenCode. I think the minimalism of Pi meshes better with the limited capabilities of open models.
malwrar 2 hours ago [-]
+1 to this, anecdotally I’ve found in my own evaluations that if your system prompt doesn’t explicitly declare how to invoke a tool and e.g. describe what each tool does, most models I’ve tried fail to call tools or will try to call them but not necessarily use the right format. With the right prompt meanwhile, even weak models shoot up in eval accuracy.
vardalab 1 hours ago [-]
Have frontier lab do the plan which is the most time consuming part anyways and then local llm do the implementation.
Frontier model can orchestrate your tickets, write a plan for them and dispatch local llm agents to implement at about 180 tokens/s, vllm can probably ,manage something like 25 concurrent sessions on RTX 6000
Do it all in a worktrees and then have frontier model do the review and merge.
I am just a retired hobbyist but that's my approach, I run everything through gitea issues, each issue gets launched by orchestrator in a new tmux window and two main agents (implementer and reviewer get their own panes so I can see what's going on). I think claude code now has this aspect also somewhat streamlined but I have seen no need to change up my approach yet since I am just a retired hobbyist tinkering on my personal projects. Also right now I just use claude code subagents but have been thinking of trying to replace them with some of these Qwen 3.5 models because they do seem cpable and I have the hardware to run them.
lreeves 45 minutes ago [-]
In my experience Qwen3.5/Qwen3-Coder-Next perform best in their own harness, Qwen-Code. You can also crib the system prompt and tool definitions from there though. Though caveat, despite the Qwen models being the state of the art for local models they are like a year behind anything you can pay for commercially so asking for it to build a new app from scratch might be a bit much.
ihsw 2 hours ago [-]
[dead]
quantum_state 3 hours ago [-]
I would second that Qwen3.5 is exceptionally good. In a calibration, it (35b variant) was running locally with Ada NextGen 24GB to do the same things with easy-llm-cli in comparison with gemini-cli + Gemini 3 Pro, they were at par … really impressive it ran pretty fast …
vardalab 2 hours ago [-]
q4 quant gives you 175 tg and 7K pp, beats most cloud providers
lacoolj 1 hours ago [-]
I wonder if an american company poached one/all of them. They've been pretty much bleeding edge of open models and would not surprise me if Amazon or Google snatched them up
ferfumarma 1 hours ago [-]
It would surprise me if they're willing to come to the US in the setting of the current DHS and ICE situation.
ilaksh 4 hours ago [-]
Does anyone know when the small Qwen 3.5 models are going to be on OpenRouter?
Like 4B, 2B, 9B. Supposedly they are surprisingly smart.
Sakthimm 2 hours ago [-]
Yep. The 9B has excellent image recognition. I showed it a PCB photo and it correctly identified all components and the board type from part numbers and shape. OCR quality was solid.
Tool calling with opencode worked without issues, but general coding ability is still far from sonnet-tier. Asked it to add a feature to an existing react app, it couldn't produce an error-free build and fell into a delete-redo loop. Even when I fixed the errors, the UI looked really bad. A more explicit prompt probably would have helped. Opus one-shotted it, same prompt, the component looked exactly as expected.
But I'll be running this locally for note summarization, code review, and OCR. Very coherent for its size.
I am singularly impressed by 35B/A3, hope that is not the reason he had to leave.
hwers 4 hours ago [-]
My conspiracy theory hat is that somehow investors with a stake in openai as well is sabotaging, like they did when kicking emad out of stabilityai
storus 3 hours ago [-]
More likely some high ranking party member's nepobaby from Gemini sniffed success with Qwen and the original folks just walked away as their reward disappeared.
ahmadyan 59 minutes ago [-]
source?
WarmWash 45 minutes ago [-]
There is no source. But the party in China does have ultimate control.
There would never be an Anthropic/Pentagon situation in China, because in China there isn't actually separation between the military and any given AI company. The party is fully in control.
liuliu 3 hours ago [-]
apples v.s. oranges. The later is true, Emad did get sabotaged (for not being able to raise money in time, about 8-month before he's leaving). Junyang didn't have that long arc of incidents.
raffael_de 5 hours ago [-]
> me stepping down. bye my beloved qwen.
the qwen is dead, long live the qwen.
vonneumannstan 4 hours ago [-]
Were they kneecapped by Anthropic blocking their distillation attempts?
zozbot234 2 hours ago [-]
What Anthropic was complaining about is training on mass-elicited chat logs. It is very much a ToS violation (you aren't allowed to exploit the service for the purpose of building a competitor) so the complaint is well-founded but (1) it's not "distillation" properly understood; it can only feasibly extract the same kind of narrow knowledge you'd read out from chat logs, perhaps including primitive "let's think step by step" output (which are not true fine-tuned reasoning tokens); because you have no access to the actual weights; and (2) it's something Western AI firms are very much believed to do to one another and to Chinese models all the time anyway. Hence the brouhaha about Western models claiming to be DeepSeek when they answer in Chinese.
red2awn 1 hours ago [-]
The "distillation attacks" are mostly using Claude as LLM-as-a-judge. They are not training on the reasoning chains in a SFT fashion.
zozbot234 54 minutes ago [-]
So they're paying expensive input tokens to extract at best a tiny amount of information ("judgment") per request? That's even less like "distillation" than the other claim of them trying to figure out reasoning by asking the model to think step by step.
kartika848484 2 hours ago [-]
what the hell, their models were promising tho
aplomb1026 2 hours ago [-]
[dead]
butILoveLife 4 hours ago [-]
[flagged]
kamranjon 4 hours ago [-]
I use Qwen 3 Coder Next daily on my mac as my main coding agent. It is incredibly capable and its strange how you are painting this picture as if its a fringe use case, there are whole communities that have popped up around running local models.
butILoveLife 4 hours ago [-]
Can I doubt your claim? I have had such terrible luck with AI coding on <400B models. Not to mention, I imagine your codebase is tiny. Or you are working for some company that isnt keeping track of your productivity.
I am trying super hard to use cheap models, and outside SOTA models, they have been more trouble than they are worth.
kamranjon 2 hours ago [-]
Absolutely. So my codebase is huge, it's a monolith. But my work is in very specific parts of the codebase, I don't pull the entire code base into context (and I don't think that is common practice even with claude) - I start at a specific point with a specific task and work with the agent to achieve something clearly defined, for example writing tests, extracting things into separate files, refactoring or even scaffolding a new feature. You have to periodically start new threads, because you'll start hitting the limits of the context, but I max it out at over 200k because I have the memory overhead on my 128gb mbp to do that, so I can get quite a lot done.
I really recommend trying the Qwen models - 3 coder next is really incredible. GLM 4.7 flash is also incredibly performant on modest hardware. Important things to consider is setting the temperature and top_p and top_k values etc based on what is recommended by the provider of the model - a thing as simple as that could result in a huge difference in performance.
The other big leap for me was switching to Zed editor and getting its agent stuff just seamlessly integrated. If you run LM Studio on your local machine it's super easy and even setting it up on a remote machine and calling out to LM Studio is dead simple.
arcanemachiner 3 hours ago [-]
Yesterday, I got Qwen-Coder-Next to build a python script that reads a Postman collection, pulls the data from it to build a request to one of the endpoints, download a specific group of files whose URLs were buried in the JSON payload in that endpoint, then transform then all to a specific size of PNG, all without breaking a sweat. I didn't even have to tell it to use Pillow, but it did everything to a T.
Use case means everything. I doubt this model would fare well on a large codebase, but this thing is incredible.
simonw 4 hours ago [-]
The thing I'm most excited about is the moment that I run a model on my 64GB M2 that can usefully drive a coding agent harness.
Yesterday I test ran Qwen3.5-35B-A3B on my MBP M3 Pro with 36GB via LM Studio and OpenCode. I didn’t have it write code but instead use Rodney (thanks for making it btw!) to take screenshots and write documentation using them. Overall I was pretty impressed at how well it handled the harness and completed the task locally. In the past I would’ve had Haiku do this, but I might switch to doing it locally from now on.
xrd 4 hours ago [-]
I suppose this shows my laziness because I'm sure you have written extensively about it, but what orchestrator (like opencode) do you use with local models?
simonw 4 hours ago [-]
I've not really settled on one yet. I've tried OpenCode and Codex CLI, but I know I should give Pi a proper go.
So far none of them have be useful enough at first glance with a local model for me to stick with them and dig in further.
xrd 4 hours ago [-]
I've used opencode and the remote free models they default to aren't awful but definitely not on par with Gemini CLI nor Claude. I'm really interested in trying to find a way to chain multiple local high end consumer Nvidia cards into an alternative to the big labs offering.
arcanemachiner 3 hours ago [-]
Kimi K2.5 is pretty good, you can use it on OpenRouter. Fireworks is a good provider, they were giving free access to the model on OpenCode when it first released.
PhilipRoman 3 hours ago [-]
When you say you use local model in OpenCode, do you mean through the ollama backend? Last time I tried it with various models, I got issues where the model was calling tools in the wrong format.
NortySpock 3 hours ago [-]
I managed to get qwen2.5-coder:14B working under ollama on an Nvidia 2080 Ti with 11GB of VRAM, using ollama cli, outputting what looks like 200 words-per-minute to my eye
It has been useful for education ("What does this Elixir code do? <Paste file> ..... <general explanation> "then What this line mean?")
as well as getting a few basic tests written when I'm unfamiliar with the syntax. ("In Elixir Phoenix, given <subject under test, paste entire module file> and <test helper module, paste entire file> and <existing tests, pasted in, used both for context and as examples> , what is one additional test you would write?")
This is useful in that I get a single test I can review, run, paste in, and I'm not using any quota. Generally I have to fix it, but that's just a matter of reading the actual test and throwing the test failure output to the LLM to propose a fix. Some human judgement is required but once I got going adding a test took 10 minutes despite being relatively unfamiliar with Elixir Phoenix .
It's a nice loop, I'm in the loop, and I'm learning Elixir and contributing a useful feature that has tests.
benatkin 4 hours ago [-]
I think this is directing coders towards self-sufficiency and that's a good thing. If they don't end up using it for agentic coding, they can use it for running tests, builds, non-agentic voice controlled coding, video creation, running kubernetes, or agent orchestration. So no, it's not evil, even if it doesn't go quite as expected.
multisport 4 hours ago [-]
inb4 qwen is less of a supply chain risk than anthropic
Rendered at 20:39:22 GMT+0000 (Coordinated Universal Time) with Vercel.
I've been testing Qwen3.5-35B-A3B over the past couple of days and it's a very impressive model. It's the most capable agentic coding model I've tested at that size by far. I've had it writing Rust and Elixir via the Pi harness and found that it's very capable of handling well defined tasks with minimal steering from me. I tell it to write tests and it writes sane ones ensuring they pass without cheating. It handles the loop of responding to test and compiler errors while pushing towards its goal very well.
The main quirk I've found is that it has a tendency to decide halfway through following my detailed instructions that it would be "simpler" to just... not do what I asked, and I find it has stripped all the preliminary support infrastructure for the new feature out of the code.
> Blah blah blah (second guesses its own reasoning half a dozen times then goes). Actually, it would be a simpler to just ...
Specifically on Antigravity, I've noticed it doing that trying to "save time" to stay within some artificial deadline.
It might have something to do with the system messages and the reinforcement/realignment messages that are interwoven into the context (but never displayed to end-users) to keep the agents on task.
That sounds too close to what I feel on some days xD
That would seem logical, as the results are then completely deterministic, but it turns out that a suboptimal token may result in a better answer in the long run. Also, allowing for a little bit of noise gives the model room to talk itself out of a suboptimal path.
I wonder if determinism will be less harmful to diffusion models because they perform multiple iterations over the response rather than having only a single shot at each position that lacks lookahead. I'm looking forward to finding out and have been playing with a diffusion model locally for a few days.
It's also driving itself crazy with deadpool & deadpool-r2d2 that it chose during planning phase.
That said, it does seem to be doing a very good job in general, the code it has created is mostly sane other than this fuss over the database layer, which I suspect I'll have to intervene on. It's certainly doing a better job than other models I'm able to self-host so far.
I think this is part of the model’s success. It’s cheap enough that we’re all willing to let it run for extremely long times. It takes advantage of that by being tenacious. In my experience it will just keep trying things relentlessly until eventually something works.
The downside is that it’s more likely to arrive at a solution that solves the problem I asked but does it in a terribly hacky way. It reminds me of some of the junior devs I’ve worked with who trial and error their way into tests passing.
I frequently have to reset it and start it over with extra guidance. It’s not going to be touching any of my serious projects for these reasons but it’s fun to play with on the side.
I can live with this on my own hardware. Where Opus4.6 has developed this tendency to where it will happily chew through the entire 5-hour allowance on the first instruction going in endless circles. I’ve stopped using it for anything except the extreme planning now.
> Do you feel you could replace the frontier models with it for everyday coding? Would/will you?
Probably not yet, but it's really good at composing shell commands. For scripting or one-liner generation, the A3B is really good. The web development skills are markedly better than Qwen's prior models in this parameter range, too.
It's really easy to setup with any OpenAI compatible API and I self host Qwen Coder 3 Next on my personal MBP using LM Studio and just dial in from my work laptop with Zed and tailscale so i can connect from wherever i might be. It's able to do all sorts of things like run linting checks and tests and look for issues and refactor code and create files and things like this. I'm definitely still learning, but it's a pretty exciting jump from just talking to a chat bot and copying and pasting things manually.
This terminology is still very much undefined though, so my version may not be the winning definition.
[0] https://www.reddit.com/r/LocalLLaMA/comments/1rivckt/visuali...
Qwen3.5-35B-A3B means that the model itself consists of 35 billion floating point numbers - very roughly 35GB of data - which are all loaded into memory at once.
But... on any given pass through the model weights only 3 billion of those parameters are "active" aka have matrix arithmetic applied against them.
This speeds up inference considerably because the computer has to do less operations for each token that is processed. It still needs the full amount of memory though as the 3B active it uses are likely different on every iteration.
If models like Qwen can get good enough for coding tasks locally, the real shift might be economic rather than purely capability.
*edit: not that it matters, but since MAGA can't help but assume, these are all US citizens and green card holders that I am referring to.
https://apnews.com/article/immigration-raid-hyundai-korea-ic...
https://www.koreatimes.co.kr/foreignaffairs/20251112/hundred...
https://www.pbs.org/newshour/nation/attorney-says-detained-k...
The regime is powered by racism and doesn't think through things.
The risk and level of publicity is just too high for many people to even consider, especially people already intelligent/capable enough to immigrate anywhere else that doesn't have these issues or stay in their own country.
It'd be good if Congress could do something to remove the masks, put cameras on these agents, and for the local governments to stop fighting removal of all people who are here illegally so we can pretend we have borders again.
Their goal is for every one person violently detained, 10 decide to leave on their own, and 100 decide to not come in the first place.
If they actually thoroughly evicted non-status migrant workers they'd have a outright revolt on their hands from farmers and other businesses that depend on them.
Instead those businesses can now take further advantage of the fear of harassment and/or deportation to drive down compensation and rights.
Contrast with countries like Canada that have a legal temporary foreign agriculture worker program that provides a regulated source of seasonal migrant farm worker labour under a non-citizen temporary status, but with some rights (still often abused). It's notable to me as a Canadian that I don't see this being advocated on any large scale by either party in the US.
Anyways, all this just to say that the jackboot clown theater is the point, not a side effect.
Because going around and harassing and deporting other or non-essential non-status immigrants would drive labor costs down because of the chill it would put through those who are grudgingly tolerated.
And besides, given the quality of personality ICE seems to be employing even (especially) at its highest levels, I simply assume there's corruption such that if I'm a large orchard or whatever I simply pay ICE to stay away.
"There was a significant drop-off in entries to the United States in 2025 relative to 2024 and an increase in enforcement activity leading to removals and voluntary departures. We estimate that net migration was between –10,000 and –295,000 in 2025, the first time in at least half a century it has been negative."
There'd be no food on the tables, frankly. And people in Silicon Valley would have messy houses and algae in their pools.
Soybean farmers are screwed.
I dont think youre as right as you want to believe. Certainly not as right as I want you to believe
Europe really just needs to rally behind Mistral. That's where they should dump their cash.
In any case, there's no way Anthropic's investors in Silicon Valley would countenance such a move.
Also, I'm biased the logical place is Canada, not Europe. Much of the fundamental/foundational research on LLMs, and a large part of the talent, came from universities in Canada anyways.
I will say we are winning in accessibility. China doesn’t have much of a ramp game
I wonder if you max out your options in China. It seems the Party is suspicious of ambition and high profile winners. I'm sure you can live comfortably, but there's a ceiling.
Well duh, as recently demonstrated, an US model used by the US gov will 100% end up murdering actual children sooner than later, in this case less than a calendar year in some far flung war that many Americans do not support. Alternatively PRC model used by CCP might kill in some hypothetical future but for national reunification/rejuvenation that many Chinese support. At the end of the day, researchers and population on one side sleeps more soundly.
Isn't it just straight-up illegal in China to refuse the government from using your model? USA isn't perfect, but at least it has active discourse.
People in Hong Kong died. Over 10,000 were arrested and many are still in prison. The rest are permanently disgraced in their social-credit society.
Again, USA is not perfect, but let's not dream up some fantasy about the CCP.
https://reclaimthenet.org/china-man-chair-interrogation-soci...
But this:
> According to the social credit system, Chinese citizens are punishable if they indulge in buying too many video games, buying too much junk food, having a friend online who has a low credit score, visiting unauthorized websites, posting “fake news” online, and more.
...is just pure bullshit. There were _ideas_ about including these kinds of stuff into the score, but they have never been implemented. At this point, the social credit score is only used to find people who dodge court decisions.
In the US people try to hide it and are far more sinister about it, since there are a lot of laws against obvious racism. The cops are also happy in the US to just kill you.
The racism in the US comes out of hate where as what I experienced abroad was more, we don't think you'll fit in and follow the rules and you have to constantly prove that you can.
I didn't spend too much time in China so maybe it is a racist hell hole.
But my experience in Japan was that white immigrants were way more inclined to make a huge deal about the lighter racism they experienced because they had never been somewhere where their skin color was a disadvantage.
I'm sure it's a very nice place to live if you're content to just stay quiet in society and never put a political sign in your yard or even just talk about the wrong thing with your friend in a WeChat.
In practical terms, if you're not kind of person who would want to run for an office in the US, China is incredibly comfortable. Cities are safe, with barely any violent crime. Public drug use is nonexistent. And with the US-level AI researcher income, you'd be in the top 0.1% earners.
https://news.ycombinator.com/item?id=47252833
My comment and the linked video says otherwise. The guy was in a private group chat and said some nasty things about the police for confiscating his motorcycle. Now he's arrested and in the Tiger Chair.
How are we explaining this?
Not as bad as China sure, but not as good as other civilized nations.
When I was a deep learning PhD in the first Trump administration, US universities were already very deeply affected by the Muslim ban, and so a lot of talent ended up in other countries.
Sibling commentators are rightfully pointing out that foreigners, especially those who would not be recognized as white, face an onerous and risky customs process with long-term and increasing risks of deportation. When you see a headline like the NIST labs abruptly restricting foreign scientists, _everything_ else feels uncertain. Even if someone doesn't believe they're personally at risk for deportation, they're still seeing everything else.
And then it all boils down to a reputational thing. The era where we were the top choice for research is in the past. If you start a PhD in the US on your resume during this era, you might be anticipating how you'll answe the question of why you weren't good enough to get accepted somewhere better.
I think Alibaba needs to just give these guys a blank check. Let them fill it in themselves. Absent that, I'm pretty sure they'll make their own startup.
I do think it'd be a big loss for the rest of the world though if they close whatever model their startup comes up with.
That's very likely to happen once the gap with OpenAI/Anthropic has been closed and they managed to pop the bubble.
Wild times!
https://status.claude.com/
If AI could effectively replace people, you wouldn’t need CEOs to keep trying to convince people.
To be honest, it's sort of what I expected governments to be funding right now, but I suppose Chinese companies are a close second.
If so, I'm happy that the team held together, and I hope that endogenous tech leads get to control their own career and tech destiny after hard work leads to great products. (It's almost as inspiring as tank man, and the tank commanders who tried to avoid harming him...)
(ducking the downvote for challenging the primacy of equity...)
Is there a better agentic coding harness people are using for these models? Based on my experience I can definitely believe the claims that these models are overfit to Evals and not broadly capable.
They also struggle at translating very broad requirements to a set of steps that I find acceptable. Planning helps a lot.
Regarding the harness, I have no idea how much they differ but I seem to have more luck with https://pi.dev than OpenCode. I think the minimalism of Pi meshes better with the limited capabilities of open models.
But I'll be running this locally for note summarization, code review, and OCR. Very coherent for its size.
https://news.ycombinator.com/item?id=47246746
There would never be an Anthropic/Pentagon situation in China, because in China there isn't actually separation between the military and any given AI company. The party is fully in control.
the qwen is dead, long live the qwen.
I am trying super hard to use cheap models, and outside SOTA models, they have been more trouble than they are worth.
I really recommend trying the Qwen models - 3 coder next is really incredible. GLM 4.7 flash is also incredibly performant on modest hardware. Important things to consider is setting the temperature and top_p and top_k values etc based on what is recommended by the provider of the model - a thing as simple as that could result in a huge difference in performance.
The other big leap for me was switching to Zed editor and getting its agent stuff just seamlessly integrated. If you run LM Studio on your local machine it's super easy and even setting it up on a remote machine and calling out to LM Studio is dead simple.
Use case means everything. I doubt this model would fare well on a large codebase, but this thing is incredible.
Maybe Qwen3.5-35B-A3B is that model? This comment reports good results: https://news.ycombinator.com/item?id=47249343#47249782
I need to put that through its paces.
So far none of them have be useful enough at first glance with a local model for me to stick with them and dig in further.
It has been useful for education ("What does this Elixir code do? <Paste file> ..... <general explanation> "then What this line mean?")
as well as getting a few basic tests written when I'm unfamiliar with the syntax. ("In Elixir Phoenix, given <subject under test, paste entire module file> and <test helper module, paste entire file> and <existing tests, pasted in, used both for context and as examples> , what is one additional test you would write?")
This is useful in that I get a single test I can review, run, paste in, and I'm not using any quota. Generally I have to fix it, but that's just a matter of reading the actual test and throwing the test failure output to the LLM to propose a fix. Some human judgement is required but once I got going adding a test took 10 minutes despite being relatively unfamiliar with Elixir Phoenix .
It's a nice loop, I'm in the loop, and I'm learning Elixir and contributing a useful feature that has tests.