Verification

This section covers the verification of AIOStack for AI agents and ML workloads in Kubernetes.

1

Check Installation Status

Verify pods are running:

kubectl get pods -n ai-observability
kubectl get daemonset -n ai-observability

Expected output:

NAME                                    READY   STATUS    RESTARTS   AGE
ai-observability-stack-xxxxx            1/1     Running   0          2m
ai-observability-stack-yyyyy            1/1     Running   0          2m

Check logs for any errors:

kubectl logs -n ai-observability -l app=ai-observability-stack --tail=50
2

Test Metrics Endpoint

Method 1: Port-forward to test locally

# Get a pod name
POD_NAME=$(kubectl get pods -n ai-observability -l app=ai-observability-stack -o jsonpath='{.items[0].metadata.name}')

# Port forward to the metrics port
kubectl port-forward -n ai-observability pod/$POD_NAME 7429:7429

In another terminal, test the metrics endpoint:

curl http://localhost:7470/metrics

Expected output should include metrics like:

# HELP ai_llm_requests_total Total number of LLM API requests
# TYPE ai_llm_requests_total counter
ai_llm_requests_total{provider="openai",model="gpt-4"} 0

# HELP ai_ml_library_calls_total Total ML library function calls
# TYPE ai_ml_library_calls_total counter
ai_ml_library_calls_total{library="pytorch",function="forward"} 0

Method 2: Test via Service

# Create a test pod
kubectl run test-pod --rm -i --tty --image=curlimages/curl -- sh

# Then from inside the test pod:
curl http://ai-observability-stack.ai-observability.svc.cluster.local:7429/metrics
3

Health Check

# Port forward health check port
kubectl port-forward -n ai-observability pod/$POD_NAME 8080:8080
curl http://localhost:8080/health

Expected response:

{
  "status": "healthy",
  "version": "v1.0.0",
  "monitored_libraries": ["pytorch", "tensorflow", "transformers"],
  "active_providers": ["openai", "anthropic"]
}
4

Validate eBPF Program Loading

Check if eBPF programs are loaded correctly:

# Execute on a node to check eBPF programs
kubectl debug node/NODE_NAME -it --image=ubuntu -- chroot /host bash

# Inside the debug container:
bpftool prog list | grep ai_observability
ls /sys/fs/bpf/ai_observability/

Check kernel logs for eBPF-related messages:

kubectl logs -n ai-observability pod/$POD_NAME | grep -i ebpf