
Taalas HC1 is an AI accelerator hardwired (i.e, implemented in hardware) with Llama-3.1 8B and delivering close to 17,000 tokens/s of AI performance with the model, outperforming datacenter accelerators such as NVIDIA B200 or Cerebras chips. The Taalas HC1 is about 10x faster than the Cerebras chip, costs 20x less to build, and consumes 10x less power. The main downside is that it only works with the model hardwired into the hardware, currently Llama-3.1 8B, although we’re told it “retains flexibility through configurable context window size and support for fine-tuning via low-rank adapters (LoRAs)”. Hardware accelerators usually come with memory on one side and compute on the other. Both operate at different speeds, and the memory bandwidth is usually the bottleneck for Large Language Models. Taalas technology unifies storage and compute on a single chip, at DRAM-level density, to massively increase the performance and reduce power consumption. Ultra-fast inference can […]
The post Taalas HC1 hardwired Llama-3.1 8B AI accelerator delivers up to 17,000 tokens/s appeared first on CNX Software – Embedded Systems News.


