It depends on your use case. Because,
➡️ Tokens: LLMs charge per token — both input and output. A “token” is roughly 3–4 characters or 0.75 words.
➡️ Calls: More users, more calls. One user prompt can lead to 2–3 model calls (retries, refinement, system prompts).
➡️ Model choice: GPT-4 is 15x more expensive than GPT-3.5. Anthropic and Claude are priced differently. Open-source models are free to use but come with infra and ops costs.
➡️ Context cost: Every time you inject a long prompt with system instructions or business data, it counts as tokens. It adds up.
Hence, plan for peak usage, average input size, and cost of retries and evaluations.