Better performance than TQ and better quality than FP16?
Am I reading this right??
qeternity 1 hours ago [-]
It's not better quality: 59.3% vs 59.4% fp16 on AIME 25
thefox96 1 hours ago [-]
Faster than Fp16, not better quality i guess
pbich 1 hours ago [-]
[dead]
v3ss0n 2 hours ago [-]
Why this is not a PR for vLLM ?
esafak 2 hours ago [-]
It's the output of a research paper; the authors are not trying to build up vLLM, and they probably have no incentive to do so. You can submit a PR, though! It's easier now while the divergence is low, so don't wait. Since there are six authors, I bet you could get help with the inevitable review chores if you just take the step of creating the PR.
And with the help of AI, pointing at AI at this paper and saying "making a vLLM PR from this paper" tends to work surprisingly well, even if you need to nudge it a little bit along the way.
thefox96 47 minutes ago [-]
it should be easy to do btw
shockembopper 57 minutes ago [-]
[dead]
Rendered at 18:15:07 GMT+0000 (Coordinated Universal Time) with Vercel.
Am I reading this right??
edit: It might not be clear that it is based on vLLM 0.22, which is the current version: https://github.com/huawei-csl/KVarN/commit/d6290e99098d7426d.... All you have to do is create a diff off it; it's fairly straightforward.