Distributed Tracing
rag_control integrates with OpenTelemetry for distributed tracing, providing visibility into request execution flow and performance bottlenecks.
Overview
Distributed tracing tracks:
- Complete request lifecycle
- Stage-by-stage execution
- Latency at each stage
- External service calls
- Errors and exceptions
Span Hierarchy
Requests create a nested span hierarchy. The root span is rag_control.request.<mode> where mode is either run or stream. Child spans follow the pattern rag_control.request.<mode>.stage.<stage_name>:
rag_control.request.run (or .stream)
├── rag_control.request.run.stage.org_lookup
├── rag_control.request.run.stage.embedding
├── rag_control.request.run.stage.retrieval
├── rag_control.request.run.stage.policy.resolve
├── rag_control.request.run.stage.prompt.build
├── rag_control.request.run.stage.llm.generate (or .llm.stream)
└── rag_control.request.run.stage.enforcement
Span Attributes
Each span includes relevant attributes:
Root Request Span
rag_control.request.run (or .stream)
├── request_id: "550e8400-e29b-41d4-a716-446655440000"
├── mode: "run" | "stream"
├── org_id: "acme_corp"
├── user_id: "user-123"
└── status: "ok" | "error"
Organization Lookup Span
rag_control.request.run.stage.org_lookup
├── filter_name: "enterprise_filter" (or null)
├── retrieval_top_k: 5
└── stage_latency_ms: 2
Query Embedding Span
rag_control.request.run.stage.embedding
├── embedding_model: "text-embedding-ada-002"
├── embedding_dimensions: 1536
└── stage_latency_ms: 350
Document Retrieval Span
rag_control.request.run.stage.retrieval
├── vector_index: "qdrant" (or other vector store)
├── returned: 5
└── stage_latency_ms: 75
Policy Resolution Span
rag_control.request.run.stage.policy.resolve
├── policy_name: "strict_citations"
└── stage_latency_ms: 2
LLM Generation Span
For run mode:
rag_control.request.run.stage.llm.generate
├── llm_model: "gpt-4"
├── temperature: 0.0