KFlow: fast IPFIX flows collector with Kafka export
Downloading and running
KFlow requires at least Java 1.8 to run. It is known to work with Apache Kafka 2.3.0, but any recent version should be fine. Linux is required to achieve high performance, and MacOS can be used for testing.
KFlow is distributed as a self-containing JAR (also known as fat JAR). Simply download kflow-VERSION-all.jar
from GitHub releases page and run the following
(assuming java is on your PATH):
java -jar kflow-VERSION-all.jar
You can also build the project yourself (see Building section).
Configuration
By default KFlow listens for IPFIX on port 4739/udp and pushes decoded flows to local Kafka broker localhost:9092
to topic ipfix. It also exposes Prometheus metrics on port 8080/tcp. This behavior can be customized by overriding
default configuration.
Configuration values are resolved from multiple sources with the following precedence, from highest to lowest:
- Java system properties prefixed with
app., usually passed as one or more-Dapp.name=valuecommand line arguments, - custom
.propertiesfile specified with-cor--configcommand line argument, - default configuration values.
Important properties
| Property | Description |
|---|---|
server.port |
Port number to listen for IPFIX flows |
server.threads |
Number of processing threads |
server.buffer.size |
SO_RCVBUF buffer size in bytes (allocated per processing thread) |
kafka.topic |
Kafka topic to write decoded flows to |
kafka.producers |
Number of Kafka producers (see Performance notes) |
kafka.props.bootstrap.servers |
Kafka brokers to setup initial connection with |
kafka.props.* |
Various Kafka producer configuration properties (see producer docs) |
metrics.port |
Port number to expose Prometheus metrics |
Output format
KFlow produces JSON records with the following fields:
| Field | Type | Description |
|---|---|---|
dvc |
string (IPv4) | IPv4 address of host that sent IPFIX packet |
src |
string (IPv4) | sourceIPv4Address (8) |
srcp |
number | sourceTransportPort (7) |
dst |
string (IPv4) | destinationIPv4Address (12) |
dstp |
number | destinationTransportPort (11) |
proto |
number | protocolIdentifier (4) |
flags |
number | tcpControlBits (6) |
bytes |
number | octetDeltaCount (1) |
pkts |
number | packetDeltaCount (2) |
time |
number | Export time in milliseconds since epoch |
For the description of fields other than dvc or time please refer to
IPFIX entities list.
Performance notes
-
Because of the way Netty handles UDP channels, KFlow requires
epoll()call to be available to achieve high performance. It still can run withoutepoll()support, though all processing will be done in a signle thread regardless of theserver.threadssetting. -
At a very high volumes (about 500,000 flows per second in our setup) shared lock inside Kafka producer code becomes a bottleneck. To overcome this limitation, KFlow can create multiple producers (as specified by
kafka.producersconfig value) and distribute load between them. -
Receive queue monitoring and detecting packet drops is crucial for handling large volumes of traffic in production. KFlow exposes
server_packet_dropsandserver_receive_queuePrometheus metrics. Both are parsed from/proc/net/udpand therefore available only on Linux. -
When exposed metrics are not enough for performance troubleshooting, we recommend using profiling tools such as the excellent async-profiler.
Building
KFlow uses Gradle as its build system.
./gradlew buildbuilds the project,./gradlew runruns it.
Fat JAR is produced in build/libs/kflow-VERSION-all.jar. The build also assembles redistributable application
archives in build/distributions folder.
Extensibility
KFlow was built with extensibility in mind.
Support for a new flow export format, such as Netflow or sFlow, can be added by implementing PacketDecoder
interface and passing it as a constructor parameter to Server class. Handling multiple formats in a single
application is possible by creating multiple Server instances with different decoders and ports.
Similarly, Kafka output can be replaced by a custom Sink interface implementation, which is to be passed
as parameter to Server class constructor.
Limitations
-
KFlow supports IPv4 addresses only. There are no plans for adding IPv6 support.
-
The exported field set is limited to basic fields only because of performance considerations.
If you need more feature rich solution and don't care about performance that much, you should probably
take a look at Cloudflare's goflow. In our setup a single goflow
node was able to handle about 250.000 flows per second.

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
