Inference Acceleration for Large Language Models on CPUs

Inference

In this paper, we explore the utilization of CPUs for accelerating the inference of large language models.

Showing 56 results

Reward-Based Token Modelling with Selective Cloud Assistance

Inference

This method not only reduces the traffic to the cloud LLM, thereby lowering costs, but also allows for flexible control over response quality depending on the reward score threshold.

Dataset for Advancing Academic Knowledge and Machine Reasoning

Dataset

With a composition of 11.53 billion tokens, integrating 8.01 billion tokens of synthetic data with 3.52 billion tokens of rich textbook data, Intellecta is crafted to foster advanced reasoning and comprehensive educational narrative generation.

Inference Acceleration for Large Language Models on CPUs

Inference

In this paper, we explore the utilization of CPUs for accelerating the inference of large language models.

Research & Thoughts

Inference Acceleration for Large Language Models on CPUs

Reward-Based Token Modelling with Selective Cloud Assistance

Dataset for Advancing Academic Knowledge and Machine Reasoning

Inference Acceleration for Large Language Models on CPUs

Company

Product

Resources