Skip to content

ewhauser/flume-kafka-plugin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Flume plugin for Kafka

Fair warning - this is an experiment and is not used anywhere in production.

Sink

To use the KafkaSink, add the following to your flume-site.xml file:

<property>
  <name>flume.plugin.classes</name>
  <value>org.apache.flume.kafka.KafkaSink</value>
</property>

To configure the sink:

flume shell -c localhost:35873 -e exec config localhost 'tail("/var/log/messages")' 'kafka("localhost:2181", "topic")'

If you want your events to go to a particular partition, you can set a property on the event called "kafka.partition.key" to provide a key for the Kafka client to partition on.

Source

To use the KafkaSource, add the following to your flume-site.xml file:

<property>
  <name>flume.plugin.classes</name>
  <value>org.apache.flume.kafka.KafkaSource</value>
</property>

To configure the source:

flume shell -c localhost:35873 -e exec config localhost 'kafka("localhost:2181", "topic", "my_group_id", "1")' 'console'

The parameter list is (zookeeper_string, topic, group_id, num_consumer_threads).

If your downstream logic is not idempotent, you should not currently be using the Kafka source. The Kafka client commits the offsets for the messages it has received to ZooKeeper asyncronously (configurable in Kafka) as a background operation. Because the Flume source is not committing the message offsets on every event processed, you will end up getting duplicates.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published