Tags

We will setup flume to write data from /var/log/messages to HDFS

1. Create the agent configuration in the log source server as below

====================

## Flume NG Apache Log Collection

## Refer to https://cwiki.apache.org/confluence/display/FLUME/Getting+Started
##
# http://flume.apache.org/FlumeUserGuide.html#exec-source
agent.sources = apache
agent.sources.apache.type = exec
agent.sources.apache.command = tail -F /var/log/messages
agent.sources.apache.batchSize = 1
agent.sources.apache.channels = memoryChannel
agent.sources.apache.interceptors = itime ihost itype
# http://flume.apache.org/FlumeUserGuide.html#timestamp-interceptor
agent.sources.apache.interceptors.itime.type = timestamp
# http://flume.apache.org/FlumeUserGuide.html#host-interceptor
agent.sources.apache.interceptors.ihost.type = host
agent.sources.apache.interceptors.ihost.useIP = false
agent.sources.apache.interceptors.ihost.hostHeader = host
# http://flume.apache.org/FlumeUserGuide.html#static-interceptor
agent.sources.apache.interceptors.itype.type = static
agent.sources.apache.interceptors.itype.key = log_type
agent.sources.apache.interceptors.itype.value = apache_access_combined

# http://flume.apache.org/FlumeUserGuide.html#memory-channel
agent.channels = memoryChannel
agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity = 100

## Send to Flume Collector on 1.2.3.4 (Hadoop Slave Node)
# http://flume.apache.org/FlumeUserGuide.html#avro-sink
agent.sinks = AvroSink
agent.sinks.AvroSink.type = avro
agent.sinks.AvroSink.channel = memoryChannel
agent.sinks.AvroSink.hostname = 192.68.10.2
agent.sinks.AvroSink.port = 4545

## Debugging Sink, Comment out AvroSink if you use this one
# http://flume.apache.org/FlumeUserGuide.html#file-roll-sink
#agent.sinks = localout
#agent.sinks.localout.type = file_roll
#agent.sinks.localout.sink.directory = /var/log/flume
#agent.sinks.localout.sink.rollInterval = 0
#agent.sinks.localout.channel = memoryChannel

agent.sinks.hdfs-sink.hdfs.kerberosPrincipal = flume/hostname@DOMAIN.COM
agent.sinks.hdfs-sink.hdfs.kerberosKeytab = /etc/security/hadoop/keytab/flume.service.keytab

=========================

2.Start the agent as below

#flume-ng agent -f flume.conf -n agent

3. Create the collector configuration.

This machine can be any server which has access to HDFS and it should be able to contact the agent

====================

##Sources########################################################
## Accept Avro data In from the Edge Agents
# http://flume.apache.org/FlumeUserGuide.html#avro-source
collector.sources = AvroIn
collector.sources.AvroIn.type = avro
collector.sources.AvroIn.bind = 0.0.0.0
collector.sources.AvroIn.port = 4545
collector.sources.AvroIn.channels = mc1 mc2

## Channels ########################################################
## Source writes to 2 channels, one for each sink (Fan Out)
collector.channels = mc1 mc2

# http://flume.apache.org/FlumeUserGuide.html#memory-channel
collector.channels.mc1.type = memory
collector.channels.mc1.capacity = 100

collector.channels.mc2.type = memory
collector.channels.mc2.capacity = 100

## Sinks ###########################################################
collector.sinks = LocalOut HadoopOut

## Write copy to Local Filesystem (Debugging)
# http://flume.apache.org/FlumeUserGuide.html#file-roll-sink
collector.sinks.LocalOut.type = file_roll
collector.sinks.LocalOut.sink.directory = /var/log/flume
collector.sinks.LocalOut.sink.rollInterval = 0
collector.sinks.LocalOut.channel = mc1

## Write to HDFS
# http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
collector.sinks.HadoopOut.type = hdfs
collector.sinks.HadoopOut.channel = mc2
collector.sinks.HadoopOut.hdfs.path = /flume/events/%{log_type}/%{host}/%y-m-0.%d
collector.sinks.HadoopOut.hdfs.fileType = DataStream
collector.sinks.HadoopOut.hdfs.writeFormat = Text
collector.sinks.HadoopOut.hdfs.rollSize = 0
collector.sinks.HadoopOut.hdfs.rollCount = 10000
collector.sinks.HadoopOut.hdfs.rollInterval = 600

collector.sinks.hdfs-sink.hdfs.kerberosPrincipal = flume/hostname@DOMAIN.COM
collector.sinks.hdfs-sink.hdfs.kerberosKeytab = /etc/security/hadoop/keytab/flume.service.keytab

=====================

4. Start collector as below

#flume-ng agent -f flume-collector.conf -n collector

Then the contents of /var/log/messages will start coming into HDFS path /flume/events/apache_access_combined/sjc1ddpp05.crd.ge.com/13-11-06/FlumeData.1383741508563.tmp. Even if the folders are not there flume will create it.

Advertisements