Note that there are some explanatory texts on larger screens.

plurals
  1. POHow do you programmatically configure hazelcast for the multicast discovery mechanism?
    primarykey
    data
    text
    <p>How do you programmatically configure hazelcast for the multicast discovery mechanism? </p> <hr> <p>Details:</p> <p>The <a href="http://www.hazelcast.com/docs/1.9.4/manual/multi_html/ch11.html#ConfigFullTcpIp">documentation</a> only supplies an example for TCP/IP and is out-of-date: it uses Config.setPort(), which no longer exists.</p> <p>My configuration looks like this, but discovery does not work (i.e. I get the output <code>"Members: 1"</code>:</p> <pre><code> Config cfg = new Config(); NetworkConfig network = cfg.getNetworkConfig(); network.setPort(PORT_NUMBER); JoinConfig join = network.getJoin(); join.getTcpIpConfig().setEnabled(false); join.getAwsConfig().setEnabled(false); join.getMulticastConfig().setEnabled(true); join.getMulticastConfig().setMulticastGroup(MULTICAST_ADDRESS); join.getMulticastConfig().setMulticastPort(PORT_NUMBER); join.getMulticastConfig().setMulticastTimeoutSeconds(200); HazelcastInstance instance = Hazelcast.newHazelcastInstance(cfg); System.out.println("Members: "+hazelInst.getCluster().getMembers().size()); </code></pre> <hr> <h3>Update 1, taking asimarslan's answer into account</h3> <p>If I fumbled with the MulticastTimeout, I either get <code>"Members: 1"</code> or </p> <blockquote> <p>Dec 05, 2013 8:50:42 PM com.hazelcast.nio.ReadHandler WARNING: [192.168.0.9]:4446 [dev] hz._hzInstance_1_dev.IO.thread-in-0 Closing socket to endpoint Address[192.168.0.7]:4446, Cause:java.io.EOFException: Remote socket closed! Dec 05, 2013 8:57:24 PM com.hazelcast.instance.Node SEVERE: [192.168.0.9]:4446 [dev] Could not join cluster, shutting down! com.hazelcast.core.HazelcastException: Failed to join in 300 seconds!</p> </blockquote> <hr> <h3>Update 2, taking pveentjer's answer about using tcp/ip into account</h3> <p>If I change the configuration to the following, I still only get 1 member:</p> <pre><code>Config cfg = new Config(); NetworkConfig network = cfg.getNetworkConfig(); network.setPort(PORT_NUMBER); JoinConfig join = network.getJoin(); join.getMulticastConfig().setEnabled(false); join.getTcpIpConfig().addMember("192.168.0.1").addMember("192.168.0.2"). addMember("192.168.0.3").addMember("192.168.0.4"). addMember("192.168.0.5").addMember("192.168.0.6"). addMember("192.168.0.7").addMember("192.168.0.8"). addMember("192.168.0.9").addMember("192.168.0.10"). addMember("192.168.0.11").setRequiredMember(null).setEnabled(true); //this sets the allowed connections to the cluster? necessary for multicast, too? network.getInterfaces().setEnabled(true).addInterface("192.168.0.*"); HazelcastInstance instance = Hazelcast.newHazelcastInstance(cfg); System.out.println("debug: joined via "+join+" with "+hazelInst.getCluster() .getMembers().size()+" members."); </code></pre> <p>More precisely, this run produces the output</p> <blockquote> <p>debug: joined via JoinConfig{multicastConfig=MulticastConfig [enabled=false, multicastGroup=224.2.2.3, multicastPort=54327, multicastTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=true, connectionTimeoutSeconds=5, members=[192.168.0.1, 192.168.0.2, 192.168.0.3, 192.168.0.4, 192.168.0.5, 192.168.0.6, 192.168.0.7, 192.168.0.8, 192.168.0.9, 192.168.0.10, 192.168.0.11], requiredMember=null], awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', tagKey='null', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}} with 1 members.</p> </blockquote> <p>My non-hazelcast-implementation is using UDP multicasts and works fine. So can a firewall really be the problem? </p> <hr> <h3>Update 3, taking pveentjer's answer about checking the network into account</h3> <p>Since I do not have permissions for iptables or to install iperf, I am using <code>com.hazelcast.examples.TestApp</code> to check whether my network is working, as described in <a href="http://books.google.de/books?id=tzAa_MxjNrsC&amp;pg=PT12&amp;lpg=PT12&amp;dq=%22getting+started+with+hazelcast%22">Getting Started With Hazelcast</a> in Chapter 2, Section "Showing Off Straight Away":</p> <p>I call <code>java -cp hazelcast-3.1.2.jar com.hazelcast.examples.TestApp</code> on 192.168.0.1 and get the output </p> <pre><code>...Dec 10, 2013 11:31:21 PM com.hazelcast.instance.DefaultAddressPicker INFO: Prefer IPv4 stack is true. Dec 10, 2013 11:31:21 PM com.hazelcast.instance.DefaultAddressPicker INFO: Picked Address[192.168.0.1]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true Dec 10, 2013 11:31:22 PM com.hazelcast.system INFO: [192.168.0.1]:5701 [dev] Hazelcast Community Edition 3.1.2 (20131120) starting at Address[192.168.0.1]:5701 Dec 10, 2013 11:31:22 PM com.hazelcast.system INFO: [192.168.0.1]:5701 [dev] Copyright (C) 2008-2013 Hazelcast.com Dec 10, 2013 11:31:22 PM com.hazelcast.instance.Node INFO: [192.168.0.1]:5701 [dev] Creating MulticastJoiner Dec 10, 2013 11:31:22 PM com.hazelcast.core.LifecycleService INFO: [192.168.0.1]:5701 [dev] Address[192.168.0.1]:5701 is STARTING Dec 10, 2013 11:31:24 PM com.hazelcast.cluster.MulticastJoiner INFO: [192.168.0.1]:5701 [dev] Members [1] { Member [192.168.0.1]:5701 this } Dec 10, 2013 11:31:24 PM com.hazelcast.core.LifecycleService INFO: [192.168.0.1]:5701 [dev] Address[192.168.0.1]:5701 is STARTED </code></pre> <p>I then call <code>java -cp hazelcast-3.1.2.jar com.hazelcast.examples.TestApp</code> on 192.168.0.2 and get the output </p> <pre><code>...Dec 10, 2013 9:50:22 PM com.hazelcast.instance.DefaultAddressPicker INFO: Prefer IPv4 stack is true. Dec 10, 2013 9:50:22 PM com.hazelcast.instance.DefaultAddressPicker INFO: Picked Address[192.168.0.2]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true Dec 10, 2013 9:50:23 PM com.hazelcast.system INFO: [192.168.0.2]:5701 [dev] Hazelcast Community Edition 3.1.2 (20131120) starting at Address[192.168.0.2]:5701 Dec 10, 2013 9:50:23 PM com.hazelcast.system INFO: [192.168.0.2]:5701 [dev] Copyright (C) 2008-2013 Hazelcast.com Dec 10, 2013 9:50:23 PM com.hazelcast.instance.Node INFO: [192.168.0.2]:5701 [dev] Creating MulticastJoiner Dec 10, 2013 9:50:23 PM com.hazelcast.core.LifecycleService INFO: [192.168.0.2]:5701 [dev] Address[192.168.0.2]:5701 is STARTING Dec 10, 2013 9:50:23 PM com.hazelcast.nio.SocketConnector INFO: [192.168.0.2]:5701 [dev] Connecting to /192.168.0.1:5701, timeout: 0, bind-any: true Dec 10, 2013 9:50:23 PM com.hazelcast.nio.TcpIpConnectionManager INFO: [192.168.0.2]:5701 [dev] 38476 accepted socket connection from /192.168.0.1:5701 Dec 10, 2013 9:50:28 PM com.hazelcast.cluster.ClusterService INFO: [192.168.0.2]:5701 [dev] Members [2] { Member [192.168.0.1]:5701 Member [192.168.0.2]:5701 this } Dec 10, 2013 9:50:30 PM com.hazelcast.core.LifecycleService INFO: [192.168.0.2]:5701 [dev] Address[192.168.0.2]:5701 is STARTED </code></pre> <p>So multicast discovery is generally working on my cluster, right? Is 5701 also the port for discovery? Is <code>38476</code> in the last output an ID or a port? </p> <p>Joining still does not work for my own code with programmatical configuration :(</p> <hr> <h3>Update 4, taking pveentjer's answer about using the default configuration into account</h3> <p>The modified TestApp gives the output </p> <pre><code>joinConfig{multicastConfig=MulticastConfig [enabled=true, multicastGroup=224.2.2.3, multicastPort=54327, multicastTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=false, connectionTimeoutSeconds=5, members=[], requiredMember=null], awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', tagKey='null', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}} </code></pre> <p>and does detect other members after a couple of seconds (after each instance once lists only itself as a member if all are started at the same time), whereas </p> <p>myProgram gives the output </p> <pre><code>joined via JoinConfig{multicastConfig=MulticastConfig [enabled=true, multicastGroup=224.2.2.3, multicastPort=54327, multica\ stTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=false, connectionTimeoutSecond\ s=5, members=[], requiredMember=null], awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', tagKey='nu\ ll', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}} with 1 members. </code></pre> <p>and does not detect members within its runtime of about 1 minute (I am counting the members about every 5 seconds). </p> <p>BUT if at least one instance of TestApp runs concurrently on the cluster, all TestApp instances and all myProgram instances are detected and my program works fine. In case I start TestApp once and then myProgram twice in parallel, TestApp gives the following output:</p> <pre><code>java -cp ~/CaseStudy/jtorx-1.10.0-beta8/lib/hazelcast-3.1.2.jar:. TestApp Dec 12, 2013 12:02:15 PM com.hazelcast.instance.DefaultAddressPicker INFO: Prefer IPv4 stack is true. Dec 12, 2013 12:02:15 PM com.hazelcast.instance.DefaultAddressPicker INFO: Picked Address[192.168.180.240]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true Dec 12, 2013 12:02:15 PM com.hazelcast.system INFO: [192.168.180.240]:5701 [dev] Hazelcast Community Edition 3.1.2 (20131120) starting at Address[192.168.180.240]:5701 Dec 12, 2013 12:02:15 PM com.hazelcast.system INFO: [192.168.180.240]:5701 [dev] Copyright (C) 2008-2013 Hazelcast.com Dec 12, 2013 12:02:15 PM com.hazelcast.instance.Node INFO: [192.168.180.240]:5701 [dev] Creating MulticastJoiner Dec 12, 2013 12:02:15 PM com.hazelcast.core.LifecycleService INFO: [192.168.180.240]:5701 [dev] Address[192.168.180.240]:5701 is STARTING Dec 12, 2013 12:02:21 PM com.hazelcast.cluster.MulticastJoiner INFO: [192.168.180.240]:5701 [dev] Members [1] { Member [192.168.180.240]:5701 this } Dec 12, 2013 12:02:22 PM com.hazelcast.core.LifecycleService INFO: [192.168.180.240]:5701 [dev] Address[192.168.180.240]:5701 is STARTED Dec 12, 2013 12:02:22 PM com.hazelcast.management.ManagementCenterService INFO: [192.168.180.240]:5701 [dev] Hazelcast will connect to Management Center on address: http://localhost:8080/mancenter-3.1.2/ Join: JoinConfig{multicastConfig=MulticastConfig [enabled=true, multicastGroup=224.2.2.3, multicastPort=54327, multicastTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=false, connectionTimeoutSeconds=5, members=[], requiredMember=null], awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', tagKey='null', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}} Dec 12, 2013 12:02:22 PM com.hazelcast.partition.PartitionService INFO: [192.168.180.240]:5701 [dev] Initializing cluster partition table first arrangement... hazelcast[default] &gt; Dec 12, 2013 12:03:27 PM com.hazelcast.nio.SocketAcceptor INFO: [192.168.180.240]:5701 [dev] Accepting socket connection from /192.168.0.8:38764 Dec 12, 2013 12:03:27 PM com.hazelcast.nio.TcpIpConnectionManager INFO: [192.168.180.240]:5701 [dev] 5701 accepted socket connection from /192.168.0.8:38764 Dec 12, 2013 12:03:27 PM com.hazelcast.nio.SocketAcceptor INFO: [192.168.180.240]:5701 [dev] Accepting socket connection from /192.168.0.7:54436 Dec 12, 2013 12:03:27 PM com.hazelcast.nio.TcpIpConnectionManager INFO: [192.168.180.240]:5701 [dev] 5701 accepted socket connection from /192.168.0.7:54436 Dec 12, 2013 12:03:32 PM com.hazelcast.partition.PartitionService INFO: [192.168.180.240]:5701 [dev] Re-partitioning cluster data... Migration queue size: 181 Dec 12, 2013 12:03:32 PM com.hazelcast.cluster.ClusterService INFO: [192.168.180.240]:5701 [dev] Members [3] { Member [192.168.180.240]:5701 this Member [192.168.0.8]:5701 Member [192.168.0.7]:5701 } Dec 12, 2013 12:03:43 PM com.hazelcast.partition.PartitionService INFO: [192.168.180.240]:5701 [dev] Re-partitioning cluster data... Migration queue size: 181 Dec 12, 2013 12:03:45 PM com.hazelcast.partition.PartitionService INFO: [192.168.180.240]:5701 [dev] All migration tasks has been completed, queues are empty. Dec 12, 2013 12:03:46 PM com.hazelcast.nio.TcpIpConnection INFO: [192.168.180.240]:5701 [dev] Connection [Address[192.168.0.8]:5701] lost. Reason: Socket explicitly closed Dec 12, 2013 12:03:46 PM com.hazelcast.cluster.ClusterService INFO: [192.168.180.240]:5701 [dev] Removing Member [192.168.0.8]:5701 Dec 12, 2013 12:03:46 PM com.hazelcast.cluster.ClusterService INFO: [192.168.180.240]:5701 [dev] Members [2] { Member [192.168.180.240]:5701 this Member [192.168.0.7]:5701 } Dec 12, 2013 12:03:48 PM com.hazelcast.partition.PartitionService INFO: [192.168.180.240]:5701 [dev] Partition balance is ok, no need to re-partition cluster data... Dec 12, 2013 12:03:48 PM com.hazelcast.nio.TcpIpConnection INFO: [192.168.180.240]:5701 [dev] Connection [Address[192.168.0.7]:5701] lost. Reason: Socket explicitly closed Dec 12, 2013 12:03:48 PM com.hazelcast.cluster.ClusterService INFO: [192.168.180.240]:5701 [dev] Removing Member [192.168.0.7]:5701 Dec 12, 2013 12:03:48 PM com.hazelcast.cluster.ClusterService INFO: [192.168.180.240]:5701 [dev] Members [1] { Member [192.168.180.240]:5701 this } Dec 12, 2013 12:03:48 PM com.hazelcast.partition.PartitionService INFO: [192.168.180.240]:5701 [dev] Partition balance is ok, no need to re-partition cluster data... </code></pre> <p>The only difference I see in TestApp's configuration is </p> <pre><code>config.getManagementCenterConfig().setEnabled(true); config.getManagementCenterConfig().setUrl("http://localhost:8080/mancenter-"+version); for(int k=1;k&lt;= LOAD_EXECUTORS_COUNT;k++){ config.addExecutorConfig(new ExecutorConfig("e"+k).setPoolSize(k)); } </code></pre> <p>so I added it in a desperate attempt into myProgram, too. But it does not solve the problem - still each instance only detects itself as member during the whole run.</p> <hr> <h3>Update about how long myProgram runs</h3> <p>Could it be that the program is not running long enough (as pveentjer put it)?</p> <p>My experiments seem to confirm this: If the time <em>t</em> between <code>Hazelcast.newHazelcastInstance(cfg);</code> and initializing <code>cleanUp()</code> (i.e. no longer communicating via hazelcast and no longer checking the number of members) is </p> <ul> <li>less than 30 seconds, no communication and <code>members: 1</code></li> <li>more than 30 seconds: all members are found and communication happens (which weirdly seems to be happening for much longer than <em>t</em> - 30 seconds). </li> </ul> <p>Is 30 seconds a realistic time span that a hazelcast cluster needs, or is there something strange going on? Here is a log from 4 myPrograms running concurrently (looking for hazelcast-members overlaps 30 seconds for instance 1 and instance 3):</p> <pre><code>instance 1: 2013-12-19T12:39:16.553+0100 LOG 0 (START) engine started looking for members between 2013-12-19T12:39:21.973+0100 and 2013-12-19T12:40:27.863+0100 2013-12-19T12:40:28.205+0100 LOG 35 (Torx-Explorer) Model SymToSim is about to\ exit instance 2: 2013-12-19T12:39:16.592+0100 LOG 0 (START) engine started looking for members between 2013-12-19T12:39:22.192+0100 and 2013-12-19T12:39:28.429+0100 2013-12-19T12:39:28.711+0100 LOG 52 (Torx-Explorer) Model SymToSim is about to\ exit instance 3: 2013-12-19T12:39:16.593+0100 LOG 0 (START) engine started looking for members between 2013-12-19T12:39:22.145+0100 and 2013-12-19T12:39:52.425+0100 2013-12-19T12:39:52.639+0100 LOG 54 (Torx-Explorer) Model SymToSim is about to\ exit INSTANCE 4: 2013-12-19T12:39:16.885+0100 LOG 0 (START) engine started looking for members between 2013-12-19T12:39:21.478+0100 and 2013-12-19T12:39:35.980+0100 2013-12-19T12:39:36.024+0100 LOG 34 (Torx-Explorer) Model SymToSim is about to\ exit </code></pre> <p>How do I best start my actual distributed algorithm only after enough members are present in the hazelcast cluster? Can I set <code>hazelcast.initial.min.cluster.size</code> programmatically? <a href="https://groups.google.com/forum/#!topic/hazelcast/sa-lmpEDa6A">https://groups.google.com/forum/#!topic/hazelcast/sa-lmpEDa6A</a> sounds like this would block <code>Hazelcast.newHazelcastInstance(cfg);</code> until the initial.min.cluster.size is reached. Correct? How synchronously (within which time span) will the different instances unblock?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload