Skip to main content
Henry costs are paid by the customer to their AI provider, not to Brand Atlas. The cost depends on usage volume, the model being used, and the provider’s current rates. This page covers what to expect and how to cap usage so the cost stays predictable.

How the cost is calculated

Both OpenAI and Gemini charge per token. A token is roughly a short word or a piece of a word; one English word averages about 1.3 tokens. A single Henry interaction has two token counts:
  • Input tokens. The prompt sent to the provider: the team member’s question plus the brand-record context plus the system prompt and conversation history. A typical interaction sends 1,000 to 5,000 input tokens.
  • Output tokens. The model’s response. A typical interaction generates 100 to 500 output tokens.
Both providers price input and output differently (output is more expensive). The published rates change over time; check the current rates on the providers’ pricing pages.

A working estimate

At current rates (as of mid-2026), a typical Henry interaction costs in the range of 0.001to0.001 to 0.01. Light team (5 team members, ~10 questions per day combined). About 200-300 interactions per month. Expected cost: 0.50to0.50 to 3. Medium team (15 team members, ~40 questions per day combined). About 800-1,200 interactions per month. Expected cost: 2to2 to 15. Heavy team (50 team members, ~200 questions per day combined). About 4,000-6,000 interactions per month. Expected cost: 10to10 to 60. These are rough order-of-magnitude. Actual costs vary with how complex the questions are, how much brand-record context is needed, and which provider and model is configured.

Setting caps on the provider side

Both providers support usage limits. Setting one is recommended on day one.

OpenAI

Settings → Billing → Usage limits → Set monthly limit. A starting limit of $50/month is sensible for most teams. Adjust as you see real usage. OpenAI sends notifications at 50%, 75%, and 100% of the limit. At 100%, API calls are rejected; Henry stops responding until the limit is increased or the next month begins.

Gemini

Google Cloud Console → Billing → Budgets & alerts → Create budget. Set a monthly amount and notification thresholds (e.g. notify at 50%, 90%, 100%). Unlike OpenAI’s hard limit, Google Cloud budgets are alerts by default; they do not stop API calls unless you separately configure a hard limit through quota management. Most teams treat the alert as sufficient; a few configure a hard quota to enforce.

What drives cost up

Four behaviours that increase token usage and therefore cost:
  1. Many long questions. A team that uses Henry like a search engine generates more cost than a team that uses it like a reference assistant. The two are both legitimate; the cost shape differs.
  2. Large brand records. A heavily populated atlas with dozens of active Horizons sends more grounding context per interaction. The cost-per-question is higher.
  3. Long conversations. Each turn in a conversation includes the prior turns. A long thread costs more than several short threads.
  4. Higher-tier models. Both providers offer multiple model tiers; the smartest models cost more per token. Brand Atlas uses the provider’s recommended model for Henry by default. You can change the model in Settings → AI → Henry → Model.

What drives cost down

Conversely:
  1. A well-populated atlas. Counter-intuitively, a populated atlas reduces cost because Henry returns the answer directly from the record rather than searching widely.
  2. A trained team. Team members who phrase questions specifically generate cheaper interactions than team members who phrase them vaguely.
  3. A smaller model tier. For routine reference questions, the lower-tier models are usually sufficient. The higher-tier models matter for nuanced questions.
  4. Conversation hygiene. Closing a session when the question is answered, rather than leaving it open across the day, keeps individual conversations short.

Rate limits

Both providers have rate limits separate from cost limits:
  • Requests per minute. A maximum number of API calls in any minute.
  • Tokens per minute. A maximum number of tokens processed in any minute.
Default limits are usually generous. For very high-volume teams, the limits can be increased by request from the provider. Henry surfaces a “Rate limit reached, try again shortly” message when limits are hit; the limit usually resets within a minute.

Splitting cost across teams

A customer with multiple atlases running across departments may want to attribute Henry cost per atlas. Two patterns:
  1. One API key per atlas, one project per atlas in the provider account. Each project’s usage is reported separately; bill per project. The cleanest pattern.
  2. One API key shared across atlases. Simpler to manage; aggregated cost. Suitable when the cost is small enough that attribution does not matter.
Brand Atlas does not enforce a pattern; the choice is the customer’s.

What happens when the limit is hit

If the provider’s hard limit is reached:
  • Henry stops responding to new questions until the limit is increased or the next billing period.
  • The brand owner is notified.
  • Existing Brand Atlas features unrelated to Henry continue working normally.
If only the soft alert is reached:
  • Henry continues to work.
  • The brand owner is notified.
  • The decision about whether to raise the alert level is the brand owner’s.

The Brand Atlas subscription is separate

Worth restating: the Keeper subscription (79/moor79/mo or 799/yr) and the Guardian subscription (199/moor199/mo or 1,999/yr) are flat. They do not change based on Henry usage. The provider bill is separate, paid directly to OpenAI or Gemini. A team using Henry heavily pays Brand Atlas the same as a team using Henry lightly. The provider bill is what scales with usage.

Setting up Henry

Initial setup, including caps.

What Henry does

Capabilities.

Switching providers

Changing providers.