Getting Started with Observability¶

This guide provides a quick start to using observability features in the Starter App.

Table of Contents¶

Prerequisites
Quick Start
Viewing Traces
Viewing Metrics
Example Workflows

Prerequisites¶

Docker Compose - Running OTEL Collector, Tempo, and Prometheus
Poetry - Python dependency management
Running Application - Starter App must be running

Quick Start¶

1. Start Observability Stack¶

# Start all services including OTEL Collector, Tempo, and Prometheus
make up

# Or start specific services
docker-compose up -d otel-collector tempo prometheus grafana

2. Verify Services¶

# Check OTEL Collector logs
make logs-otel

# Check if services are running
docker-compose ps

Expected Output:

NAME                STATUS    PORTS
otel-collector      Up        0.0.0.0:4317->4317/tcp
tempo               Up
prometheus          Up        0.0.0.0:9090->9090/tcp
grafana             Up        0.0.0.0:3000->3000/tcp

3. Access Web UIs¶

Open the following URLs in your browser:

Grafana: http://localhost:3000
Default credentials: admin / admin
Used for viewing traces (Tempo) and metrics (Prometheus)
Prometheus: http://localhost:9090
Direct access to metrics
Execute PromQL queries

4. Generate Telemetry¶

Make API requests to generate traces and metrics:

# Create a task (generates trace and metrics)
curl -X POST http://localhost:8000/api/tasks \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{
    "title": "Test Task",
    "description": "Testing observability",
    "priority": "high"
  }'

# List tasks
curl -X GET http://localhost:8000/api/tasks \
  -H "Authorization: Bearer YOUR_TOKEN"

# Update a task
curl -X PUT http://localhost:8000/api/tasks/{task_id} \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{
    "title": "Updated Task",
    "status": "in_progress"
  }'

Viewing Traces¶

In Grafana (Recommended)¶

Open Grafana: http://localhost:3000
Navigate to Explore:
Click on the Explore icon (compass) in the left sidebar
Select Tempo Data Source:
In the dropdown at the top, select Tempo
Search for Traces:
Search tab: Search by service name
- Service Name: starter-app
- Click Run Query

TraceQL tab: Use TraceQL queries

{ service.name="starter-app" && http.status_code=200 }

View Trace Details:
Click on any trace in the results
Explore the span hierarchy
Check span attributes and timing

Example: Task Creation Trace¶

When you create a task, you'll see a trace like this:

starter-app: POST /api/tasks (200ms)
├─ create_task_entity (15ms)
│  ├─ span: validate input (3ms)
│  └─ span: create domain object (12ms)
├─ motor_task_repository.add (150ms)
│  ├─ pymongo.insert_one (140ms)
│  │  Attributes:
│  │  ├─ db.system: mongodb
│  │  ├─ db.operation: insert
│  │  └─ db.collection: tasks
│  └─ publish_domain_events (10ms)
└─ record_metrics (5ms)

What to Look For:

Duration: Total and per-span timing
Attributes: Task ID, priority, status, user ID
Events: Milestones within spans
Errors: Exceptions and error status

Viewing Metrics¶

In Prometheus¶

Open Prometheus: http://localhost:9090
Execute Queries:
Click on Graph tab
Enter a PromQL query
Click Execute

Example Queries:

# Task creation rate (per second)
rate(starter_app_tasks_created_total[5m])

# Total tasks created
starter_app_tasks_created_total

# Task processing time (95th percentile)
histogram_quantile(0.95, rate(starter_app_task_processing_time_bucket[5m]))

# Tasks by priority
sum by (priority) (starter_app_tasks_created_total)

In Grafana¶

Open Grafana: http://localhost:3000
Navigate to Explore:
Select Prometheus data source
Build Queries:
Use the Metrics Browser or write PromQL
Visualize as Table or Graph
Create Dashboards (Optional):
Click + → Dashboard
Add panels with metric queries
Save dashboard

Example Workflows¶

Workflow 1: Debug a Slow Request¶

Scenario: API endpoint is slow

Steps:

Identify slow traces in Grafana/Tempo:

{ service.name="starter-app" && duration > 500ms }

Analyze the trace:
Find the slowest span
Check database queries
Look for N+1 queries
Check metrics for patterns:

# Response time trend
histogram_quantile(0.95, rate(http_server_duration_bucket[5m]))

Fix the issue:
Add database indexes
Optimize queries
Implement caching

Workflow 2: Monitor Task Creation¶

Scenario: Track task creation rate

Steps:

Query task creation rate:

rate(starter_app_tasks_created_total[5m])

Visualize in Grafana:
Create a graph panel

Group by priority:

sum by (priority) (rate(starter_app_tasks_created_total[5m]))

Set up alerts:
Create alert rule
Threshold: < 1 (less than 1 task/sec)
Notification channel

Workflow 3: Root Cause Analysis¶

Scenario: Production error reported

Steps:

Check application logs:

make logs-app

Find error message
Copy trace ID from log
Find trace in Grafana/Tempo:
Paste trace ID in search
View full trace
Analyze failure:
Identify failed span
Check error attributes
Review span events
Correlate with metrics:

# Error rate spike?
rate(starter_app_tasks_failed_total[5m])

Check context:
Span attributes: user ID, task ID, priority
Timing: when did it start failing?
Pattern: specific user/task type?

Workflow 4: Performance Optimization¶

Scenario: Optimize task processing

Steps:

Baseline metrics:

histogram_quantile(0.95, rate(starter_app_task_processing_time_bucket[5m]))

Generate test load:

# Create multiple tasks
for i in {1..100}; do
  curl -X POST http://localhost:8000/api/tasks \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $TOKEN" \
    -d "{\"title\":\"Task $i\",\"priority\":\"high\"}"
done

Analyze traces:
Find common slow patterns
Identify bottlenecks
Implement optimizations:
Add indexes
Use batch operations
Optimize queries
Measure improvement:

# Before vs after
histogram_quantile(0.95, rate(starter_app_task_processing_time_bucket[5m]))

Next Steps¶

Metrics Guide: Learn to add custom metrics
Tracing Guide: Learn to add custom spans
Configuration: Configure observability settings
Best Practices: Follow observability best practices
Troubleshooting: Solve common issues

Tips and Tricks¶

Enable Verbose Logging¶

For debugging observability issues:

# Add to .env
OTEL_LOG_LEVEL=debug

View Raw Telemetry¶

Check OTEL Collector logs:

make logs-otel

# Or filter for specific data
docker-compose logs otel-collector | grep "Trace"
docker-compose logs otel-collector | grep "Metric"

Test Without Backend¶

Use console exporter for development:

# In .env
OTEL_TRACES_EXPORTER=console
OTEL_METRICS_EXPORTER=console

Telemetry will be printed to application logs:

make logs-app

Observability Overview - Concepts and introduction
Architecture - Technical architecture
Configuration - Detailed configuration
Troubleshooting - Common issues

Getting Started with Observability¶

Table of Contents¶

Prerequisites¶

Quick Start¶

1. Start Observability Stack¶

2. Verify Services¶

3. Access Web UIs¶

4. Generate Telemetry¶

Viewing Traces¶

In Grafana (Recommended)¶

Example: Task Creation Trace¶

Viewing Metrics¶

In Prometheus¶

In Grafana¶

Example Workflows¶

Workflow 1: Debug a Slow Request¶

Workflow 2: Monitor Task Creation¶

Workflow 3: Root Cause Analysis¶

Workflow 4: Performance Optimization¶

Next Steps¶

Tips and Tricks¶

Enable Verbose Logging¶

View Raw Telemetry¶

Test Without Backend¶

Related Documentation¶

Additional Resources¶