Measuring What Matters in the Age of AI

As enterprises race to integrate generative AI, a critical question emerges: how do we measure success beyond simple token consumption?

Amazon recently encountered this challenge with its internal MeshClaw tool, where employees competed to maximize token usage rather than focus on business outcomes—a phenomenon dubbed “tokenmaxxing.” This highlights the risk of optimizing for metrics that don’t reflect value.

Salesforce has taken a different approach with its Agentic Work Unit (AWU) framework. An AWU represents one discrete task completed by an AI agent, such as processing a prompt or executing a workflow—focusing on outputs rather than inputs.

From Volume to Value

The shift reflects a broader industry trend toward measuring AI impact through meaningful work done rather than just resources consumed:

  • Salesforce’s AWU model credits agents for completed actions like resolving customer inquiries or updating records
  • Routine tasks use fewer tokens over time while complex reasoning requires more
  • The goal is to maximize “inference-to-work ratio”: output that generates actual results

This approach addresses concerns from CFOs who need predictable AI spending and boards seeking tangible ROI.

Looking Ahead

Gartner forecasts that agentic AI will account for $450 billion in enterprise application revenue by 2035—a significant leap from today’s roughly 2% share. As AI moves beyond experimentation into production workflows, businesses are demanding more meaningful metrics to track progress and ensure accountability.

The challenge now is whether these new frameworks can withstand real-world usage patterns and avoid becoming just another number to optimize rather than a true measure of value.