Verification
This section covers the verification of AIOStack for AI agents and ML workloads in Kubernetes.
1
Check Installation Status
Verify pods are running:
kubectl get pods -n ai-observability
kubectl get daemonset -n ai-observability
Expected output:
NAME READY STATUS RESTARTS AGE
ai-observability-stack-xxxxx 1/1 Running 0 2m
ai-observability-stack-yyyyy 1/1 Running 0 2m
Check logs for any errors:
kubectl logs -n ai-observability -l app=ai-observability-stack --tail=50
2
Test Metrics Endpoint
Method 1: Port-forward to test locally
# Get a pod name
POD_NAME=$(kubectl get pods -n ai-observability -l app=ai-observability-stack -o jsonpath='{.items[0].metadata.name}')
# Port forward to the metrics port
kubectl port-forward -n ai-observability pod/$POD_NAME 7429:7429
In another terminal, test the metrics endpoint:
curl http://localhost:7470/metrics
Expected output should include metrics like:
# HELP ai_llm_requests_total Total number of LLM API requests
# TYPE ai_llm_requests_total counter
ai_llm_requests_total{provider="openai",model="gpt-4"} 0
# HELP ai_ml_library_calls_total Total ML library function calls
# TYPE ai_ml_library_calls_total counter
ai_ml_library_calls_total{library="pytorch",function="forward"} 0
Method 2: Test via Service
# Create a test pod
kubectl run test-pod --rm -i --tty --image=curlimages/curl -- sh
# Then from inside the test pod:
curl http://ai-observability-stack.ai-observability.svc.cluster.local:7429/metrics
3
Health Check
# Port forward health check port
kubectl port-forward -n ai-observability pod/$POD_NAME 8080:8080
curl http://localhost:8080/health
Expected response:
{
"status": "healthy",
"version": "v1.0.0",
"monitored_libraries": ["pytorch", "tensorflow", "transformers"],
"active_providers": ["openai", "anthropic"]
}
4
Validate eBPF Program Loading
Check if eBPF programs are loaded correctly:
# Execute on a node to check eBPF programs
kubectl debug node/NODE_NAME -it --image=ubuntu -- chroot /host bash
# Inside the debug container:
bpftool prog list | grep ai_observability
ls /sys/fs/bpf/ai_observability/
Check kernel logs for eBPF-related messages:
kubectl logs -n ai-observability pod/$POD_NAME | grep -i ebpf