Verification
This section covers the verification of AIOStack for AI agents and ML workloads in Kubernetes.
1
Check Installation Status
Verify pods are running:
kubectl get pods -n ai-observability
kubectl get daemonset -n ai-observability
Expected output:
NAME READY STATUS RESTARTS AGE
ai-observability-stack-xxxxx 1/1 Running 0 2m
ai-observability-stack-yyyyy 1/1 Running 0 2m
Check logs for any errors:
kubectl logs -n ai-observability -l app=ai-observability-stack --tail=50
2
Test Metrics Endpoint
Method 1: Port-forward to test locally
# Get a pod name
POD_NAME=$(kubectl get pods -n ai-observability -l app=ai-observability-stack -o jsonpath='{.items[0].metadata.name}')
# Port forward to the metrics port
kubectl port-forward -n ai-observability pod/$POD_NAME 7470:7470
In another terminal, test the metrics endpoint:
curl http://localhost:7470/metrics
Expected output should include metrics like:
# HELP ai_llm_requests_total Total number of LLM API requests
# TYPE ai_llm_requests_total counter
ai_llm_requests_total{provider="openai",model="gpt-4"} 0
# HELP ai_ml_library_calls_total Total ML library function calls
# TYPE ai_ml_library_calls_total counter
ai_ml_library_calls_total{library="pytorch",function="forward"} 0
Method 2: Test via Service
# Create a test pod
kubectl run test-pod --rm -i --tty --image=curlimages/curl -- sh
# Inside the test pod:
curl http://ai-observability-stack.ai-observability.svc.cluster.local:7470/metrics
3
Test Health Check
Test the health endpoint:
# Port forward health check port
kubectl port-forward -n ai-observability pod/$POD_NAME 8080:8080
# Test health endpoint
curl http://localhost:8080/health
Expected response:
{
"status": "healthy",
"version": "v1.0.0",
"ebpf_programs": {
"http_tracer": "loaded",
"syscall_tracer": "loaded",
"ssl_tracer": "loaded"
},
"monitored_libraries": ["pytorch", "tensorflow", "transformers"],
"active_providers": ["openai", "anthropic"]
}
4
Validate eBPF Program Loading
Check if eBPF programs are loaded correctly:
# Execute on a node to check eBPF programs
kubectl debug node/NODE_NAME -it --image=ubuntu -- chroot /host bash
# Inside the debug container:
bpftool prog list | grep ai_observability
ls /sys/fs/bpf/ai_observability/
Check kernel logs for eBPF-related messages:
kubectl logs -n ai-observability pod/$POD_NAME | grep -i ebpf
5
Test with Sample AI Application
Deploy a test application to generate some metrics:
cat > test-app.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: ai-test-app
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: ai-test-app
template:
metadata:
labels:
app: ai-test-app
spec:
containers:
- name: test-app
image: python:3.9-slim
command: ["/bin/bash"]
args: ["-c", "pip install openai requests && python -c \"
import openai
import requests
import time
import os
while True:
try:
# Simulate OpenAI API call (will be traced by eBPF)
response = requests.post(
'https://api.openai.com/v1/models',
headers={'Authorization': 'Bearer fake-key'},
json={}
)
print(f'API call made: {response.status_code}')
except Exception as e:
print(f'Expected error: {e}')
time.sleep(30)
\""]
EOF
kubectl apply -f test-app.yaml
Wait a few minutes, then check metrics again:
curl http://localhost:7470/metrics | grep ai_llm_requests_total