LLM VRAM Calculator

HF version of the calculator found at https://novaml.ai/vram/ Reset

HuggingFace Model Path

HF Token (Optional)

Your Hardware (Optional)

Inference Configuration

Quantization Method

KV Cache Precision

Context Length

Batch Size

Framework

Flash Attention

Estimation Results

Estimated Usage

0.0 GB

Model

Context

Overhead

Compatibility Matrix