Skip to content

@trace_llm

@trace_llm(model=None)
ParameterTypeDefaultDescription
modelstrNoneOverrides auto-detected model name
from openjck import trace_llm
@trace_llm
def call_llm(messages):
# LLM logic here
pass
from openjck import trace_llm
@trace_llm(model="qwen2.5:7b")
def call_llm(messages):
# LLM logic here
pass
from openjck import trace_llm
import asyncio
@trace_llm
async def call_llm(messages):
# async LLM logic here
pass
FieldTypeDescription
messageslistInput messages to the LLM
modelstrModel name used for the call
contentstrResponse content from the LLM
tokens_inintNumber of input tokens
tokens_outintNumber of output tokens
latency_msintCall duration in milliseconds
cost_usdfloatEstimated cost of the call

OpenJCK automatically extracts token usage from various LLM providers:

  • Ollama: Reads prompt_eval_count and eval_count from response
  • OpenAI: Uses usage.prompt_tokens and usage.completion_tokens
  • Anthropic: Reads usage.input_tokens and usage.output_tokens

If token information isn’t available, OpenJCK estimates based on character count (4 chars ≈ 1 token).

Here’s how an llm_call step appears in the trace JSON:

{
"step_id": 1,
"type": "llm_call",
"name": "call_llm",
"duration_ms": 1100,
"input": {
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
]
},
"output": {
"content": "The capital of France is Paris."
},
"error": null,
"tokens_in": 420,
"tokens_out": 89,
"model": "qwen2.5:7b",
"cost_usd": 0.0012
}