How the cost is calculated
Both OpenAI and Gemini charge per token. A token is roughly a short word or a piece of a word; one English word averages about 1.3 tokens. A single Henry interaction has two token counts:- Input tokens. The prompt sent to the provider: the team member’s question plus the brand-record context plus the system prompt and conversation history. A typical interaction sends 1,000 to 5,000 input tokens.
- Output tokens. The model’s response. A typical interaction generates 100 to 500 output tokens.
A working estimate
At current rates (as of mid-2026), a typical Henry interaction costs in the range of 0.01. Light team (5 team members, ~10 questions per day combined). About 200-300 interactions per month. Expected cost: 3. Medium team (15 team members, ~40 questions per day combined). About 800-1,200 interactions per month. Expected cost: 15. Heavy team (50 team members, ~200 questions per day combined). About 4,000-6,000 interactions per month. Expected cost: 60. These are rough order-of-magnitude. Actual costs vary with how complex the questions are, how much brand-record context is needed, and which provider and model is configured.Setting caps on the provider side
Both providers support usage limits. Setting one is recommended on day one.OpenAI
Settings → Billing → Usage limits → Set monthly limit. A starting limit of $50/month is sensible for most teams. Adjust as you see real usage. OpenAI sends notifications at 50%, 75%, and 100% of the limit. At 100%, API calls are rejected; Henry stops responding until the limit is increased or the next month begins.Gemini
Google Cloud Console → Billing → Budgets & alerts → Create budget. Set a monthly amount and notification thresholds (e.g. notify at 50%, 90%, 100%). Unlike OpenAI’s hard limit, Google Cloud budgets are alerts by default; they do not stop API calls unless you separately configure a hard limit through quota management. Most teams treat the alert as sufficient; a few configure a hard quota to enforce.What drives cost up
Four behaviours that increase token usage and therefore cost:- Many long questions. A team that uses Henry like a search engine generates more cost than a team that uses it like a reference assistant. The two are both legitimate; the cost shape differs.
- Large brand records. A heavily populated atlas with dozens of active Horizons sends more grounding context per interaction. The cost-per-question is higher.
- Long conversations. Each turn in a conversation includes the prior turns. A long thread costs more than several short threads.
- Higher-tier models. Both providers offer multiple model tiers; the smartest models cost more per token. Brand Atlas uses the provider’s recommended model for Henry by default. You can change the model in Settings → AI → Henry → Model.
What drives cost down
Conversely:- A well-populated atlas. Counter-intuitively, a populated atlas reduces cost because Henry returns the answer directly from the record rather than searching widely.
- A trained team. Team members who phrase questions specifically generate cheaper interactions than team members who phrase them vaguely.
- A smaller model tier. For routine reference questions, the lower-tier models are usually sufficient. The higher-tier models matter for nuanced questions.
- Conversation hygiene. Closing a session when the question is answered, rather than leaving it open across the day, keeps individual conversations short.
Rate limits
Both providers have rate limits separate from cost limits:- Requests per minute. A maximum number of API calls in any minute.
- Tokens per minute. A maximum number of tokens processed in any minute.
Splitting cost across teams
A customer with multiple atlases running across departments may want to attribute Henry cost per atlas. Two patterns:- One API key per atlas, one project per atlas in the provider account. Each project’s usage is reported separately; bill per project. The cleanest pattern.
- One API key shared across atlases. Simpler to manage; aggregated cost. Suitable when the cost is small enough that attribution does not matter.
What happens when the limit is hit
If the provider’s hard limit is reached:- Henry stops responding to new questions until the limit is increased or the next billing period.
- The brand owner is notified.
- Existing Brand Atlas features unrelated to Henry continue working normally.
- Henry continues to work.
- The brand owner is notified.
- The decision about whether to raise the alert level is the brand owner’s.
The Brand Atlas subscription is separate
Worth restating: the Keeper subscription (799/yr) and the Guardian subscription (1,999/yr) are flat. They do not change based on Henry usage. The provider bill is separate, paid directly to OpenAI or Gemini. A team using Henry heavily pays Brand Atlas the same as a team using Henry lightly. The provider bill is what scales with usage.Related pages
Setting up Henry
Initial setup, including caps.
What Henry does
Capabilities.
Switching providers
Changing providers.