SpikingBrain 7B – More efficient than classic LLMs

144 points by somethingsome a day ago

To me it sounds like sparse matrix multiplication repackaged as "event-driven spiking computation", where the spikes are simply the non-zero elements that sparse GPU kernels have always been designed to process.

The supposedly dynamic/temporal nature of the model seems to be not applied for GPU execution, collapsing it into a single static computation equivalent to just applying a pre-calculated sparsity mask.

Perhaps a bit cynical of me, but it feels like wrapping standard sparse computing and operator fusion in complex, biological jargon...

GregarianChild a day ago

The 'brain-inspired' community has always been doing this, since Carver Mead introduced the term 'neuromorphic' in the late 1980s. Reselling banalities as a new great insight. My favourite is "Neuromorphic computing breakthrough could enable blockchain on Mars" [1]. What else can they do? After all, that community has now multiple decades of failure under it's belt. Not a single success. Failure to make progress in AI and failure to say anything of interest about the brain. To paraphrase a US president: In this world nothing can be said to be certain, except death, taxes and neuromphicists exaggerating. (Aside: I was told by someone who applied to YC with a 'neuromorphic' startup that YC said, they don't fund 'neuromorphic'. I am not sure about details ...). The whole 'brain talk' malarkey goes back way longer. In particular psychology and related subjects, since their origins as a specialty in the 19th century, have heavily used brain-inspired metaphors that were intended to mislead. Already in the 19th century that was criticised. See [3] for an interesting discussion.
There is something interesting in this post, namely that it's based on non-Nvidia GPUs, in this case MetaX [2]. I don't know how competitive MetaX are today, but I would not bet against China in the longer term.
[1] https://cointelegraph.com/news/neuromorphic-computing-breakt...
[2] https://en.wikipedia.org/wiki/MetaX
[3] K. S. Kendler, A history of metaphorical brain talk in psychiatry. https://www.nature.com/articles/s41380-025-03053-6
- janalsncm 17 hours ago
  
  > I was told by someone who applied to YC with a 'neuromorphic' startup that YC said, they don't fund 'neuromorphic'.
  There is something refreshingly consistent in a VC that is laser focused on enterprise CRM dashboard for dashboards workflow optimization ChatGPT wrappers that also filters out the neuromorphicists.
  Reminds me of how the Samurai were so used to ritual dueling and reading their lineages before battle but when the Mongolians encountered them they just shot the samurai mid-speech.
  - nostrebored 17 hours ago
    
    There is — some of these things make money and the others don’t :)
    
    OhNoNotAgain_99 11 hours ago
    
    [dead]
- justthisguy8578 7 hours ago
  
  This is 100% true. It's just very effective and profitable PR. There IS a minority stream in the field that uses the branding to get funding and then builds real tech. But you'll never know it as neuromorphic as the label comes off once it works. Look up Synaptics (touch pad) history.
cpldcpu a day ago

I believe the argument is that you can also encode information in the time domain.
If we just look at spikes as a different numerical representation, then they are clearly inferior. For example, consider that encoding the number 7 will require seven consecutive pulses on a single spiking line. Encoding the number in binary will require one pulse on three parallel lines.
Binary encoding wins 7x in speed and 7/3=2.333x in power efficiency...
On the other hand, if we assume that we are able to encode information in the gaps between pulses, then things quickly change.
- HarHarVeryFunny 21 hours ago
  
  I think the main benefit of a neuromorphic design would be to make it dataflow driven (asynchronous event driven - don't update neuron outputs unless their inputs change) rather than synchronous, which is the big power efficiency unlock. This doesn't need to imply a spiking design though - that seems more of an implementation detail, at least as far as dataflow goes. Nature seems to use spike firing rates to encode activation strength.
  In the brain the relative timing/ordering of different neurons asynchronously activating (A before B, or B before A) is also used (spike-timing-dependent plasticity - STDP) as a learning signal to strengthen or weaken connection strengths, presumably to learn sequence prediction in this asynchronous environment.
  STDP also doesn't imply that spikes or single neuron spike train inter-spike timings are necessary - an activation event with a strength and timestamp would seem to be enough to implement a digital dataflow design, although ultimately a custom analog design may be more efficient.
  - GregarianChild 14 hours ago
    
    Can you explain the benefit of renaming dataflow as 'neuromorphic'?
    You do understand that dataflow architectures have been tried many many times? See [1] for a brief history. MIT had a bit dataflow lab for many years (lead by the recently deceased Arvind). What is the benefit of re-inventing dataflow architectures by complete amateurs who are not at all aware of the 1/2 century research tradition on dataflow architecture, and the very clear and concrete reasons when this architecture has so far failed whenever it was tried for general purpose processors?
    We can not even apply Santayana's "those who forget their history are condemned to repeat it because the 'neuromorphic' milieu doesn't even bother understanding this history.
    [1] https://csg.csail.mit.edu/Dataflow/talks/DennisTalk.pdf
    
    HarHarVeryFunny 13 hours ago
    
    > Can you explain the benefit of renaming dataflow as 'neuromorphic'?
    Neuromorphic just means brain-like or brain inspired, and brains operate in asynchronous dataflow type fashion.
    I'm not sure how you read into what I wrote that I was "renaming dataflow as neuromorphic", which was certainly not what I meant.
    I wonder if you regard Steve Furber (who I knew from Acorn), designer of the ARM CPU, as a "complete amateur"? He also designed the AMULET async processors, as well as for that matter the SpiNNaker system for spiking neural network research.
    In any case, async (dataflow) processor design, while complex, clearly isn't an impossible task, and at some point in the future when the need arises (mobile robotics?) and there is sufficient financial incentive, I expect we'll see it utilized in commercial systems.
    I'm not sure why you focus on "general purpose processors" given that we're talking about ANNs and neuromorphic systems. A custom chip would seem a better bet if the goal is to minimize power usage.
    
    GregarianChild 13 hours ago
    
    I don't rate Furber as a "complete amateur", but he's the exception in this milieu.
    > Neuromorphic just means brain-like or brain inspired,
    I don't even see any evidence that 'neuromorphic' architecture is brain inspired in a non-trivial sense. Can you please provide evidence, for example, a non-trivial mapping between 'neuromorphic' architectures (say SpiNNaker) and the SOTA models of the brain that we have, e.g. the existing data-driven model simulating C. elegans brain (the MetaWorm project)?
    As Steve Furber also says (personal communication): we don't know enough of how the brain works to have computer architectures that can meaningfully inspired by brains. The "neuro-" prefix is marketing. [1] documents this use and dates it back to the 19th century. See also
    • Neuroergonomics
    • Neurotypical
    • Neurodivergent
    • Neurodiverse
    • Neurosis
    • Neuroethics
    • Neuroeconomics
    • Neuromarketing
    • Neurolaw
    • Neurosecurity
    • Neuropsychology
    • Neuropsychoanalysis
    • Neurotheology
    • Neuro-Linguistic Programming
    • Neurogastronomy
    I have seen all the above used without irony.
    > brains operate in asynchronous dataflow type fashion.
    That's a questionable statement. To the best of my knowledge, there is no consensus as of 2025 of how to model even a single neuron. (Function of synapse is even less understood).
    When I asked Steve Furber what 'neuromorphic meant, he said: "There are many things today described as neuromphric. Mead would not call SpiNNaker as neuromorphic!"
    Furber also said: "Neuromorphic status: attracts no money, but works (in the sense of accelerate in niche domains)". Upon my asking what niche domains, he said: "brain simulation but nothing else". (He was referring to SpiNNacker). I asked him if SpiNNaker can accelerate back-propagation and he said: "no, because the brain does not do back-propagation".
    > async (dataflow) processor design, while complex, clearly isn't an impossible task
    I did not say it was impossible. It has been done many times, see my references to Arvind's lab at MIT (I spent some time there). The problem with async (dataflow) processor design is that it consistently fails to live up to its promises (PPA). There are specific technical reason for that that are quite well understood.
    > why you focus on "general purpose processors" given that we're talking about ANNs and neuromorphic systems.
    Because the 'neuromorphic' marketing often reads like they want to build more efficient 'brain-inspired' general purpose computers. Certainly the dataflow architectures (a la Arvind/MIT) tried to. This is one of the many issues with the 'neuromorphic' milieu: they are really vague about their goals. If they would restrict their claims to certain classes of accelerators, then their claims would be less delusional.
    > the goal is to minimize power usage.
    If that is the goal, they are also not very successful. On CMOS silicon changes from 0 to 1 or 0 to 1 is what consumes most of the power, this would make the constant spiking expensive, no?
    [1] K. S. Kendler, A history of metaphorical brain talk in psychiatry. https://www.nature.com/articles/s41380-025-03053-6
    
    HarHarVeryFunny 11 hours ago
    
    > I don't even see any evidence that 'neuromorphic' architecture is brain inspired in a non-trivial sense
    I'm just giving the definition of the word neuromorphic. Individual systems claiming to be neuromorphic can be judged on their own merits, but that does not change the definition of the word.
    >> brains operate in asynchronous dataflow type fashion. >That's a questionable statement. To the best of my knowledge, there is no consensus as of 2025 of how to model even a single neuron
    You're confusing two things - do we know absolutely everything there is to know about every type of neuron and synapse, in order to build a 100% accurate model of them? No. Do we know that neurons operate asynchronously to each other, only activating when their inputs change? Yes.
    > When I asked Steve Furber what 'neuromorphic meant, he said: "There are many things today described as neuromphric. Mead would not call SpiNNaker as neuromorphic!"
    Of course not - it was just an ARM-based message passing system, built to be able to efficiently simulate spiking neural networks (modelled as message passing), but with no particular specialization for that task.
    > I asked him if SpiNNaker can accelerate back-propagation and he said: "no, because the brain does not do back-propagation".
    That's a silly question, and you got the silly answer you deserved. SpiNNaker is basically just an ARM cluster with fast message passing. It can be used to accelerate anything relative to running on slower hardware. If you wanted to train an ANN on it, using backprop, you certainly could, but why would you WANT to mix spiking neural nets and backprop ?!
    > If that is the goal, they are also not very successful. On CMOS silicon changes from 0 to 1 or 0 to 1 is what consumes most of the power, this would make the constant spiking expensive, no?
    Sure, but biological neurons don't spike constantly. They spike only when an input changes that causes them to activate - maybe on average only 10's of times per second. This is the whole point of an async (no clock)/dataflow chip design - for chip elements to only consume power when their inputs change - not to route a GHz clock to them that continuously draws power (by flipping between 0 and 1 billions of times a second) even if the element's inputs are only changing 10's of times a second (or whatever).
    
    GregarianChild an hour ago
    
    > definition of the word neuromorphic
    The definition of the term is so vague as to be useless for scientific progress. What exactly is excluded by "brain inspired"? If I paint my computer grey because the brain is grey, it's brain inspired? You know that regular expressions were invented to model neurons [1]? If nothing is excluded the term is useless. But in reality, the term is used explicitly to deceive as already done in the 19th century see [2].
    Heraclitus says somewhere something to the effect that "good thinking is dry". Established terminology like
    • Asynchronous CPU
    • Delay Insensitive Circuit
    • Quasi-Delay-Insensitive Circuit
    • Clockless CPU
    • Asynchronous circuit
    • Self-timed circuit
    is dry and useful. 'Neuromorphic' suggests an understanding of how the brain works that is just not there in 2025.
    > Do we know that neurons operate asynchronously to each other, only activating when their inputs change? Yes.
    I strongly disagree.
    We do not know how exactly the brain codes information. If you have timed coding, it's not asynchronous. And that doesn't even address the chemical signals that abound in the brain. The Hodgkin-Huxley equations give a good approximation to how the neuron produces a "spike" - the signal that travels down the axon. But it's only an approximation. Every type of ion channel in the cell adds another 10 or so unkown parameters to the equations, which can only be approximately found experimentally. So we have a fairly good picture of how a single neuron behaves in most circumstances, but not complete. Ion channels are are crucial to how the brain works. They are the nonlinearity that makes computation possible, like the transistor is to the computer. There are several hundred types of ion channel that are known about so far and probably thousands more that are not yet discovered.
    > That's a silly question
    Not in the context I asked: I asked him after he had suggested that 'neuromorphic' would be great to reduce the energy consumption of LLMs. I wanted to know why/how, given that LLMs are trained today with BP. I give him the benefit of the doubt here, since we were in a rush and could not go deeper into the subject.
    > mix spiking neural nets and backprop ?!
    It's an active area of research to do just this. Google SpikeProp, e.g. [3]. Lot's of papers in this space at the moment. Why? I don't know as I don't follow this space. Potential reasons: (i) BP is natural, (ii) BP works super well for DeepLearning, (iii) the 'neuromorphic' community has failed so spectacular and want to try something that works, (iv) to get funding, (v) we really do not know how the brain works, and in that case, why not try out crazy things?
    [1] S. Kleene, Representation of events in nerve nets and finite automata. https://www.dlsi.ua.es/~mlf/nnafmc/papers/kleene56representa....
    [2] K. S. Kendler, A history of metaphorical brain talk in psychiatry. https://www.nature.com/articles/s41380-025-03053-6
    [3] https://homepages.cwi.nl/~sbohte/publication/esann.pdf
- CuriouslyC 21 hours ago
  
  https://en.wikipedia.org/wiki/Frequency-division_multiplexin...
  The brain is doing shit like this.
  - drob518 21 hours ago
    
    And more, I suspect.
- dist-epoch 21 hours ago
  
  > you can also encode information in the time domain.
  Also known as a serial interface. They are very successful: PCIe lane, SATA, USB.
  - cpldcpu 21 hours ago
    
    These interfaces use serialized binary encoding.
    SNNs are more similar to pulse density modulation (PDM), if you are looking for an electronic equivalent.
- nickpsecurity 17 hours ago
  
  "I believe the argument is that you can also encode information in the time domain."
  Brain research showed that's happening, too. You'll see many models like this if you DuckDuckGo for "spiking" "temporal" "encoding" or subtitute "time" for temporal. You can further use "neural" "network" or "brain" focus it on sub-fields.
drob518 21 hours ago

Never underestimate the power of marketing.

ziofill 19 hours ago

https://github.com/BICLab/SpikingBrain-7B/blob/main/assets/t...

Shouldn’t one bold the better numbers?

rpunkfu 19 hours ago

Inspired by GPT-5 presentation :)
- doph 18 hours ago
  
  Did they ever address that? I have not been able to stop thinking about it, it was so bizarre.
daveguy 17 hours ago

Well, then none of their model's numbers would be bold and that's not what they/AIs usually see in publications!
- cubefox 15 hours ago
  
  They do look pretty good compared to the two other linear (non-Transformer) models. Conventional attention is hard to beat in benchmarks but it is quadratic in time and memory complexity.

cgadski 13 hours ago

The technical report says (page 7):

> Our architectural choices are closely aligned with principles observed in biological brains.

How? They point out three design choices: linear attention, MoE layers, and spike coding.

Apparently linear attention is brain-inspired because it can be viewed as a "simplified abstraction of dendritic dynamics with multi-branch morphology." Who knows what that means exactly [1]. They don't discuss it further. MoE layers apparently reflect "a principle of modular specialization." Fine, whatever.

Now, using a dozen attention variants + MoE is bog standard. The real novelty would be spike coding. Page 11 is dedicated to the different ways they could turn signals into spike trains, including such biologically-inspired mechanisms as using two's complement. However, they don't actually do spike coding in a time domain. In their implementation, "spike coding" apparently means to turn activations into integers. Section 3.3.3 claims that this lets us simulate an underlying spiking neural network, so we can validate the spiking approach without using special hardware. But if your SNN can be simulated faithfully on a GPU by turning things into integers, isn't that a bit of a depressing SNN?

Either I'm missing something, or this is just just dressing standard techniques with loads of meaningless jargon. Of course that’s a very popular way to operate in deep learning nowadays.

[1] Like, attention can draw from multiple tokens, sort of like how different spines of a dendrite can draw from multiple axons? Can’t make this stuff up.

OhNoNotAgain_99 11 hours ago

[dead]

asdfasdf1 a day ago

SpikingBrain Technical Report: Spiking Brain-inspired Large Models https://arxiv.org/abs/2509.05276

bob1029 a day ago

https://news.ycombinator.com/item?id=45206420

cpldcpu a day ago

Well, it would still allow to deploy the trained model to SNN hardware, if it existed.

cpldcpu a day ago

>The current implementation adopts pseudo-spiking, where activations are approximated as spike-like signals at the tensor level, rather than true asynchronous event-driven spiking on neuromorphic hardware.

Isn't that in essence very similar to Quantization Aware Training (QaT)?

spwa4 a day ago

Can you explain more? Why would that be the case? What is being passed from one layer to the next is not a linear value but the delay until the next spike, which is very different.
- cpldcpu a day ago
  
  It was also a question from my side. :)
  But I understand that they simulate the spikes as integer events in the forward pass (as described here https://github.com/BICLab/Int2Spike) and calculate a continuous gradient based on high resolution weights for the backward pass.
  This seems to be very similar to the straight-through-estimator (STE) approach that us usually used for quantization aware training. I may be wrong though.

gunalx 16 hours ago

So significantly worse than qwen2.5, kinda useless in the current landscape. but always fun with more arcitechtures.

janalsncm 17 hours ago

They compare to Llama3.1 which is 13 months old and qwen 2.5 which is 9 months old. And they don’t beat qwen.

torotoki 16 hours ago

They use MetaX GPUs instead NVDIA's...? This point is actually more surprising.

RLAIF a day ago

SpikingBrain treats 'spikes' as 1-bit quantization stickers. True neural-level sparsity should be input-dependent, time-resolved, and self-organized during learning. If a new circuit diagram cannot 'grow' with every forward pass, then don't blame everyone for treating it as Another Sparse Marketing - oh wait, Neuromorphic Marketing.

imtringued a day ago

In a few years China will be completely independent from Nvidia.

https://en.wikipedia.org/wiki/MetaX

They have GPU manufacturers that nobody in the west has ever heard of.

astrange 8 hours ago

They need TSMC for that.
weregiraffe 19 hours ago

Then they'll have no reason to conquer Taiwan.
/s

VeejayRampay 20 hours ago

it's funny to observe how picky and cynical the HN crowd suddenly becomes when the disruptive technology is from china

bastawhiz 14 hours ago

What part of this is disruptive? It kind of has to work well to be disruptive, doesn't it?
ramon156 18 hours ago

You can't be critical anymore?
izabera 11 hours ago

deepseek is from china and all their papers have been very well received