Note that there are some explanatory texts on larger screens.

plurals
  1. POFlume collector example from Cloudera's UserGuide does not work as expected
    primarykey
    data
    text
    <p>The bit in the UserGuide that shows you how to setup a collector and write to it <a href="http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html#_tiering_flume_nodes_agents_and_collectors" rel="nofollow">http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html#_tiering_flume_nodes_agents_and_collectors</a> has this configuration:</p> <pre><code>host : console | agentSink("localhost",35853) ; collector : collectorSource(35853) | console ; </code></pre> <p>I changed this to:</p> <pre><code>dataSource : console | agentSink("localhost") ; dataCollector : collectorSource() | console ; </code></pre> <p>I spawned the nodes as:</p> <pre><code>flume node_nowatch -n dataSource flume node_nowatch -n dataCollector </code></pre> <p>I have tried this on two systems:</p> <ol> <li><p>Cloudera's own demo VM running inside VirtualBox with 2GB RAM. It comes with Flume 0.9.4-cdh3u2</p></li> <li><p>Ubuntu LTS (Lucid) with the debian package and openJDK (minus any hadoop packages installed) as a VM running inside VirtualBox with 2GB RAM Followed the steps here <a href="https://ccp.cloudera.com/display/CDHDOC/Flume+Installation#FlumeInstallation-InstallingtheFlumeRPMorDebianPackages" rel="nofollow">https://ccp.cloudera.com/display/CDHDOC/Flume+Installation#FlumeInstallation-InstallingtheFlumeRPMorDebianPackages</a></p></li> </ol> <p>Here is what I did:</p> <p><code>flume dump 'collectorSource()'</code> leads to</p> <pre><code>$ sudo netstat -anp | grep 35853 tcp6 0 0 :::35853 :::* LISTEN 3520/java $ ps aux | grep java | grep 3520 1000 3520 0.8 2.3 1050508 44676 pts/0 Sl+ 15:38 0:02 java -Dflume.log.dir=/usr/lib/flume/logs -Dflume.log.file=flume.log -Dflume.root.logger=INFO,console -Dzookeeper.root.logger=ERROR,console -Dwatchdog.root.logger=INFO,console -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 com.cloudera.flume.agent.FlumeNode -1 -s -n dump -c dump: collectorSource() | console; </code></pre> <p>My assumption is that:</p> <pre><code>flume dump 'collectorSource()' </code></pre> <p>is same as running the config:</p> <pre><code>dump : collectorSource() | console ; </code></pre> <p>and starting the node with</p> <pre><code>flume node -1 -n dump -c "dump: collectorSource() | console;" -s </code></pre> <p><code>dataSource : console | agentSink("localhost")</code> leads to</p> <pre><code>$ sudo netstat -anp | grep 35853 tcp6 0 0 :::35853 :::* LISTEN 3520/java tcp6 0 0 127.0.0.1:44878 127.0.0.1:35853 ESTABLISHED 3593/java tcp6 0 0 127.0.0.1:35853 127.0.0.1:44878 ESTABLISHED 3520/java $ ps aux | grep java | grep 3593 1000 3593 1.2 3.0 1130956 57644 pts/1 Sl+ 15:41 0:07 java -Dflume.log.dir=/usr/lib/flume/logs -Dflume.log.file=flume.log -Dflume.root.logger=INFO,console -Dzookeeper.root.logger=ERROR,console -Dwatchdog.root.logger=INFO,console -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 com.cloudera.flume.agent.FlumeNode -n dataSource </code></pre> <p>The observed behaviour <strong>is exactly the same in both</strong> the VirtualBox VMs:</p> <p>Un-ending flow of this at <strong>dataSource</strong></p> <pre><code>2011-12-15 15:27:58,253 [Roll-TriggerThread-1] INFO durability.NaiveFileWALManager: File lives in /tmp/flume-cloudera/agent/dataSource/writing/20111215-152748172-0500.1116926245855.00000034 2011-12-15 15:27:58,253 [Roll-TriggerThread-1] INFO hdfs.SeqfileEventSink: constructed new seqfile event sink: file=/tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034 2011-12-15 15:27:58,254 [naive file wal consumer-35] INFO durability.NaiveFileWALManager: opening log file 20111215-152748172-0500.1116926245855.00000034 2011-12-15 15:27:58,254 [Roll-TriggerThread-1] INFO endtoend.AckListener$Empty: Empty Ack Listener began 20111215-152758253-0500.1127006668855.00000034 2011-12-15 15:27:58,256 [naive file wal consumer-35] INFO agent.WALAckManager: Ack for 20111215-152748172-0500.1116926245855.00000034 is queued to be checked 2011-12-15 15:27:58,257 [naive file wal consumer-35] INFO durability.WALSource: end of file NaiveFileWALManager (dir=/tmp/flume-cloudera/agent/dataSource ) 2011-12-15 15:28:07,874 [Heartbeat] INFO agent.WALAckManager: Retransmitting 20111215-152657736-0500.1066489868855.00000034 after being stale for 60048ms 2011-12-15 15:28:07,875 [naive file wal consumer-35] INFO durability.NaiveFileWALManager: opening log file 20111215-152657736-0500.1066489868855.00000034 2011-12-15 15:28:07,877 [naive file wal consumer-35] INFO agent.WALAckManager: Ack for 20111215-152657736-0500.1066489868855.00000034 is queued to be checked 2011-12-15 15:28:07,877 [naive file wal consumer-35] INFO durability.WALSource: end of file NaiveFileWALManager (dir=/tmp/flume-cloudera/agent/dataSource ) 2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO hdfs.SeqfileEventSink: closed /tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034 2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO endtoend.AckListener$Empty: Empty Ack Listener ended 20111215-152758253-0500.1127006668855.00000034 2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO durability.NaiveFileWALManager: File lives in /tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034 2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO hdfs.SeqfileEventSink: constructed new seqfile event sink: file=/tmp/flume-cloudera/agent/dataSource/writing/20111215-152808335-0500.1137089135855.00000034 2011-12-15 15:28:08,336 [naive file wal consumer-35] INFO durability.NaiveFileWALManager: opening log file 20111215-152758253-0500.1127006668855.00000034 2011-12-15 15:28:08,337 [Roll-TriggerThread-1] INFO endtoend.AckListener$Empty: Empty Ack Listener began 20111215-152808335-0500.1137089135855.00000034 2011-12-15 15:28:08,339 [naive file wal consumer-35] INFO agent.WALAckManager: Ack for 20111215-152758253-0500.1127006668855.00000034 is queued to be checked 2011-12-15 15:28:08,339 [naive file wal consumer-35] INFO durability.WALSource: end of file NaiveFileWALManager (dir=/tmp/flume-cloudera/agent/dataSource ) 2011-12-15 15:28:18,421 [Roll-TriggerThread-1] INFO hdfs.SeqfileEventSink: closed /tmp/flume-cloudera/agent/dataSource/writing/20111215-152808335-0500.1137089135855.00000034 2011-12-15 15:28:18,421 [Roll-TriggerThread-1] INFO endtoend.AckListener$Empty: Empty Ack Listener ended 20111215-152808335-0500.1137089135855.00000034 .. 2011-12-15 15:35:24,763 [Heartbeat] INFO agent.WALAckManager: Retransmitting 20111215-152707823-0500.1076576334855.00000034 after being stale for 60277ms 2011-12-15 15:35:24,763 [Heartbeat] INFO durability.NaiveFileWALManager: Attempt to retry chunk '20111215-152707823-0500.1076576334855.00000034' in LOGGED state. There is no need for state transition. </code></pre> <p>Un-ending flow of this at <strong>dataCollector</strong>:</p> <pre><code>localhost [INFO Thu Dec 15 15:31:09 EST 2011] { AckChecksum : (long)1323981059821 (string) ' 4Ck��' (double)6.54133557402E-312 } { AckTag : 20111215-153059819-0500.1308572847855.00000034 } { AckType : end } </code></pre> <p>How do I get the console &lt;-> console communication via collectors working again correctly?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload