Coinbase CEO outlines five steps to cut AI costs while maintaining token use
Coinbase CEO Brian Armstrong posted five strategies to keep AI spending low while still encouraging engineers to maximize token use. He appended a graph showing that token usage recently hit one of the company’s highest levels even as AI spending fell significantly, to nearly half its peak; the exact timeline for that trend was not specified.
His first tactic is to change default models: Coinbase is experimenting with cheaper Chinese LLMs as defaults, naming GLM 5.2 and Kimi 2.7, routed through an internal LLM gateway while still urging engineers to pick the right model for each task. The plan’s middle measures focus on routing and efficiency: prompts are sent to the most appropriate models by difficulty, caching is improved to lower inference costs, and context is kept lean by starting new sessions when switching tasks.
The final piece increases spending visibility across the company. Engineers may use as many tokens as they need, but usage will be transparent and higher consumers will be expected to deliver more impact.
coinbase, brian armstrong, ai spending, token usage, glm 5.2, kimi 2.7, llm gateway, model routing, caching, inference costs