Part 3: Observability – From NetFlow & SNMP to eBPF-Powered Insights

Anyone who has operated a network knows the importance of visibility. In traditional networks, you might use SNMP counters, NetFlow records, or packet captures (SPAN ports) to understand what's happening. In a Kubernetes environment with Cilium, we have powerful new ways to observe traffic, often leveraging eBPF in the Linux kernel for deep insight with low overhead. Let's relate these new tools to the old ones:

Flow Logging and Traffic Visibility

Cisco's NetFlow is a widely used protocol that provides IP flow records (who talked to whom, when, and how much). Kubernetes doesn't natively produce NetFlow, but Cilium's Hubble component fulfills a similar role – and then some. 

../../_images/grafana_hubble_network.png
source: https://docs.cilium.io/en/stable/observability/grafana/ 

Hubble is a built-in observability platform that records detailed flow events for your cluster. Think of Hubble as a distributed packet monitor and flow logger on every node, courtesy of eBPF. It can tell you, for each connection or packet:

  • Source pod and destination pod (including their names/labels, not just IPs).
  • Which port/protocol is used, and how much data is flowing.
  • Whether it was allowed or blocked by a specific policy.
  • Even layer 7 info like HTTPS, HTTP, URLs, or gRPC methods, if L7 visibility is enabled.

This goes beyond traditional NetFlow, which would only list IPs, ports, and bytes. With Hubble, you effectively get a real-time service dependency graph and flow audit. It can answer questions like: "What services are talking to each other, and how often?" or "Which flows failed and why (was it DNS unresolved, or blocked by policy, or no listener)?" This is similar to using a tool like Cisco Stealthwatch (which analyzes NetFlow for anomalies) combined with an application performance monitor (e.g. Splunk). The difference is Hubble's data is rich with Kubernetes context. For example, instead of an IP, it will say pod X in namespace Y called pod Z on HTTP path /api.

Both Grafana and Prometheus are useful for visibility. Think of Prometheus as a network traffic analyzer that continuously captures and stores telemetry data from various routers and switches, much like how it scrapes metrics from Cilium agents and components. Meanwhile, Grafana serves as the network operations center (NOC) dashboard, providing real-time visualizations and historical analysis of traffic patterns, latency, and potential bottlenecks. Just as network engineers rely on flow monitoring tools like NetFlow or sFlow to diagnose congestion and optimize routing, Cilium uses Prometheus to collect metrics on service-to-service communication, packet drops, and policy enforcement. Grafana then acts as the intuitive interface that transforms this raw data into meaningful insights, similar to how an NOC uses dashboards to detect anomalies and ensure smooth network operations.

Troubleshooting and Alerting

So how do we practically troubleshoot issues? Network engineers often begin with ping/traceroute. In Kubernetes, you can still use ping to test basic connectivity between pods (assuming ICMP isn't blocked by policy). But you also have tools like kubectl exec to run diagnostics inside containers, and kubectl port-forward to test service endpoints from outside. However, the real superpower is using eBPF-based tools. For instance, Cilium/Hubble's CLI can show recent flows (hubble observe) filtered by pod or namespace, giving a live view of traffic similar to a tail of firewall logs. If an application can't reach a database, you could run a query to show all flows from app pod to db pod over the last minute with: hubble observe --pod SuperSecretStash --protocol http and you might see drops with "DENIED" flags if a NetworkPolicy is misconfigured.

May 12 13:23:40.501: default/King_of_Kingdom:42690 -> default/SuperSecretStash-c74d84667-cx5kp:80 http-request FORWARDED (HTTP/1.1 POST http://SuperSecretStash.default.svc.cluster.local/v1/Hidden-Bookcase_entrance)
May 12 13:23:40.502: default/King_of_Kingdom:42690 <- default/SuperSecretStash-c74d84667-cx5kp:80 http-response FORWARDED (HTTP/1.1 200 0ms (POST http://SuperSecretStash.default.svc.cluster.local/v1/Hidden-Bookcase_entrance))
May 12 13:23:43.791: default/King_of_Kingdom:42742 -> default/SuperSecretStash-c74d84667-cx5kp:80 http-request DROPPED (HTTP/1.1 PUT http://SuperSecretStash.default.svc.cluster.local/v1/DummyFrontDoor)
hubble observe --pod SuperSecretStash --verdict DROPPED
May 12 13:23:43.791: default/King_of_Kingdom:42742 -> default/SuperSecretStash-c74d84667-cx5kp:80 http-request DROPPED (HTTP/1.1 PUT http://deathstar.default.svc.cluster.local/v1/DummyFrontDoor)
May 12 13:23:47.852: default/Sniveling_Sidekick:42818 <> default/SuperSecretStash-c74d84667-cx5kp:80 SecretAccessPolicy denied DROPPED (TCP Flags: SYN)
May 12 13:23:47.852: default/Sniveling_Sidekick:42818 <> default/SuperSecretStash-c74d84667-cx5kp:80 SecretAccessPolicy denied DROPPED (TCP Flags: SYN)
May 12 13:23:48.854: default/Sniveling_Sidekick:42818 <> default/SuperSecretStash-c74d84667-cx5kp:80 SecretAccessPolicy denied DROPPED (TCP Flags: SYN)

Another example: if DNS lookups are failing, Hubble will flag DNS flow errors explicitly. In a normal network, you might only see a UDP query and no response and have to deduce a DNS issue. Hubble knows the traffic was DNS protocol and can tally failures. It even tracks things like HTTP error codes! For example, you can measure that service A is getting 5% HTTP 500 errors from service B). This blurs the line between pure network monitoring and application monitoring – something network engineers are increasingly asked to do.

Of course, we shouldn't forget the role of logging. Kubernetes nodes still have system logs, and containers log application-level info. From a security observability standpoint, you might export Hubble flow logs to a SIEM, much as you would export firewall logs or NetFlow to a SIEM in a traditional setup. Indeed, tools exist to persist Hubble flows (which are otherwise in-memory) and feed them into Elasticsearch or Splunk for long-term analysis. This is akin to enabling NetFlow collection or sFlow on all your switches and analyzing trends over time.

Key takeaways (Observability):

Hubble (Cilium) provides rich flow logs and a real-time topology of communications. It's comparable to NetFlow+Wireshark+APM combined, tailored for Kubernetes.

Deep context: Unlike traditional NetFlow, these observability tools know about pod names, namespaces, HTTP endpoints, etc. Troubleshooting is more intuitive since you see high-level service names rather than just IPs.

Distributed monitoring: Every node contributes to observability via eBPF. It's like having a continuous packet sniffer and flow analyzer on each server – but optimized and safe to run in production.

Traditional analogies: NetFlow gave 5-tuple flows – Hubble gives that plus Kubernetes metadata. SNMP gave interface stats – Kubernetes can expose per-pod network stats via metrics.

Proactive insight: You can easily set up alerts for things like "no traffic seen from service X in last 5 min" or "packet drop rate > 1% in namespace Y," similar to how you'd use syslog or SNMP traps on a Cisco device to catch issues.

Check out part four, where we start to dive deeper into services and their importance to the container environment. 

Technologies