Skip to content

Services not being re-deployed on topology change #12315

@coolkarniomkar

Description

@coolkarniomkar

Hey guys!

We discovered that when a node is re-deploying services due to multiple nodes simultaneously leaving the cluster, services are occasionally not re-deployed as expected. This can be reproduced by running the following two classes together, in separate processes. I was able to reproduce this issue once every 2-3 tries.

Node1And2.java
public class Node1And2 {
    public static void main(String[] args) throws InterruptedException {
        Ignite node1 = Ignition.start();
        Ignite node2 = Ignition.start(new IgniteConfiguration().setIgniteInstanceName("node2"));

        IgniteCluster cluster = node1.cluster();
        UUID node1ID = cluster.localNode().id();
        UUID node2ID = node2.cluster().localNode().id();

        while (cluster.topology(cluster.topologyVersion()).size() != 3) {
            Thread.sleep(10);
        }

        IgniteServices services = node1.services();
        for (int i = 0; i < 10; i++) services.deployClusterSingleton("service" + i, new ServiceImpl());

        Map<UUID, List<String>> servicesDeployedOnNodes = services.serviceDescriptors().stream().collect(Collectors.groupingBy(serviceDescriptor -> serviceDescriptor.topologySnapshot().entrySet().stream().filter(entry -> entry.getValue() != 0).findFirst().get().getKey(), Collectors.mapping(ServiceDescriptor::name, Collectors.toList())));
        System.out.println("Services deployed on node1: " + servicesDeployedOnNodes.get(node1ID));
        System.out.println("Services deployed on node2: " + servicesDeployedOnNodes.get(node2ID));

        IgniteAtomicLong isServiceDeploymentComplete = node1.atomicLong("isServiceDeploymentComplete", 0, true);
        isServiceDeploymentComplete.getAndSet(1);

        System.out.println("Now sleeping for 10s.");
        Thread.sleep(10 * 1000);

        System.exit(0);
    }
}
Node3.java
public class Node3 {
    public static void main(String[] args) throws InterruptedException {
        Ignite ignite = Ignition.start();
        UUID localNodeID = ignite.cluster().localNode().id();

        IgniteAtomicLong isServiceDeploymentComplete = ignite.atomicLong("isServiceDeploymentComplete", 0, true);
        while (isServiceDeploymentComplete.get() != 1) {
            Thread.sleep(10);
        }

        IgniteServices services = ignite.services();
        List<String> servicesDeployedOnLocalNode = services.serviceDescriptors().stream().filter(serviceDescriptor -> serviceDescriptor.topologySnapshot().containsKey(localNodeID) && serviceDescriptor.topologySnapshot().get(localNodeID) != 0).map(ServiceDescriptor::name).collect(Collectors.toList());
        System.out.println("Services deployed on node3: " + servicesDeployedOnLocalNode);

        System.out.println("Now sleeping for 20s.");
        Thread.sleep(20 * 1000);

        servicesDeployedOnLocalNode = services.serviceDescriptors().stream().filter(serviceDescriptor -> serviceDescriptor.topologySnapshot().get(localNodeID) != 0).map(ServiceDescriptor::name).collect(Collectors.toList());
        System.out.println("Services deployed on node3: " + servicesDeployedOnLocalNode + " (" + servicesDeployedOnLocalNode.size() + " total)");
    }
}

ServiceImpl is a bare minimum implementation of Service.

ServiceImpl.java
public class ServiceImpl implements Service {
}

After the process running Node1And2 stops and the two nodes leave the cluster, we expected all 10 services to get re-deployed on the third node. However, occasionally, we observed that not all services get re-deployed on the third node. Here's the output of both processes in such a case.

Output for Node1And2.java
[12:52:18] (wrn) Default Spring XML file not found (is IGNITE_HOME set?): config/default-config.xml
Sep 03, 2025 12:52:19 PM org.apache.ignite.logger.java.JavaLogger warning
WARNING: Failed to resolve default logging config file: config/java.util.logging.properties
[12:52:19]    __________  ________________ 
[12:52:19]   /  _/ ___/ |/ /  _/_  __/ __/ 
[12:52:19]  _/ // (7 7    // /  / / / _/   
[12:52:19] /___/\___/_/|_/___/ /_/ /x___/  
[12:52:19] 
[12:52:19] ver. 2.17.0#20250209-sha1:d53d4540
[12:52:19] 2025 Copyright(C) Apache Software Foundation
[12:52:19] 
[12:52:19] Ignite documentation: https://ignite.apache.org
[12:52:19] 
[12:52:19] Quiet mode.
[12:52:19]   ^-- Logging by 'JavaLogger [quiet=true, config=null]'
[12:52:19]   ^-- To see **FULL** console log here add -DIGNITE_QUIET=false or "-v" to ignite.{sh|bat}
[12:52:19] 
[12:52:19] OS: Windows 11 10.0 amd64
[12:52:19] VM information: OpenJDK Runtime Environment 11.0.22+7-adhoc..jdk11u OpenLogic OpenJDK 64-Bit Server VM 11.0.22+7-adhoc..jdk11u
[12:52:19] Please set system property '-Djava.net.preferIPv4Stack=true' to avoid possible problems in mixed environments.
[12:52:20] Configured plugins:
[12:52:20]   ^-- None
[12:52:20] 
[12:52:20] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]]
[12:52:20] Initial heap size is 384MB (should be no less than 512MB, use -Xms512m -Xmx512m).
[12:52:20] Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides.
[12:52:25] Data Regions Started: 3
[12:52:25] 
[12:52:25]     ^--   sysMemPlc region [type=internal, persistence=false, lazyAlloc=false,
[12:52:25]       ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=40MB]
[12:52:25]     ^--   default region [type=default, persistence=false, lazyAlloc=true,
[12:52:25]       ...  initCfg=256MB, maxCfg=4898MB, usedRam=0MB, freeRam=100%, allocRam=0MB]
[12:52:25]     ^--   volatileDsMemPlc region [type=user, persistence=false, lazyAlloc=true,
[12:52:25]       ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=0MB]
[12:52:25] Security status [authentication=off, sandbox=off, tls/ssl=off]
[12:52:25] Performance suggestions for grid  (fix if possible)
[12:52:25] To disable, set -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true
[12:52:25]   ^-- Specify JVM heap max size (add '-Xmx<size>[g|G|m|M|k|K]' to JVM options)
[12:52:25]   ^-- Set max direct memory size if getting 'OOME: Direct buffer memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM options)
[12:52:25]   ^-- Disable assertions (remove '-ea' from JVM options)
[12:52:25] Refer to this page for more performance suggestions: https://ignite.apache.org/docs/latest/perf-and-troubleshooting/memory-tuning
[12:52:25] 
[12:52:25] 
[12:52:25] Ignite node started OK (id=a6e3015d)
[12:52:25] Topology snapshot [ver=1, locNode=a6e3015d, servers=1, clients=0, state=ACTIVE, CPUs=8, offheap=4.8GB, heap=6.0GB]
[12:52:25]   ^-- Baseline [id=0, size=1, online=1, offline=0]
[12:52:25]    __________  ________________ 
[12:52:25]   /  _/ ___/ |/ /  _/_  __/ __/ 
[12:52:25]  _/ // (7 7    // /  / / / _/   
[12:52:25] /___/\___/_/|_/___/ /_/ /x___/  
[12:52:25] 
[12:52:25] ver. 2.17.0#20250209-sha1:d53d4540
[12:52:25] 2025 Copyright(C) Apache Software Foundation
[12:52:25] 
[12:52:25] Ignite documentation: https://ignite.apache.org
[12:52:25] 
[12:52:25] Quiet mode.
[12:52:25]   ^-- Logging by 'JavaLogger [quiet=true, config=null]'
[12:52:25]   ^-- To see **FULL** console log here add -DIGNITE_QUIET=false or "-v" to ignite.{sh|bat}
[12:52:25] 
[12:52:25] OS: Windows 11 10.0 amd64
[12:52:25] VM information: OpenJDK Runtime Environment 11.0.22+7-adhoc..jdk11u OpenLogic OpenJDK 64-Bit Server VM 11.0.22+7-adhoc..jdk11u
[12:52:25] Please set system property '-Djava.net.preferIPv4Stack=true' to avoid possible problems in mixed environments.
[12:52:25] Configured plugins:
[12:52:25]   ^-- None
[12:52:25] 
[12:52:25] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]]
[12:52:25] Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides.
[12:52:26] Initial heap size is 384MB (should be no less than 512MB, use -Xms512m -Xmx512m).
[12:52:29] Joining node doesn't have stored group keys [node=00dc7f4a-d0e4-4391-bd04-28b9ef934050]
[12:52:29] Topology snapshot [ver=2, locNode=a6e3015d, servers=2, clients=0, state=ACTIVE, CPUs=8, offheap=4.8GB, heap=6.0GB]
[12:52:29]   ^-- Baseline [id=0, size=2, online=2, offline=0]
[12:52:29] Nodes started on local machine require more than 80% of physical RAM what can lead to significant slowdown due to swapping (please decrease JVM heap size, data region size or checkpoint buffer size) [required=22244MB, available=24492MB]
[12:52:29] Data Regions Started: 3
[12:52:29] 
[12:52:29]     ^--   sysMemPlc region [type=internal, persistence=false, lazyAlloc=false,
[12:52:29]       ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=40MB]
[12:52:29]     ^--   default region [type=default, persistence=false, lazyAlloc=true,
[12:52:29]       ...  initCfg=256MB, maxCfg=4898MB, usedRam=0MB, freeRam=100%, allocRam=0MB]
[12:52:29]     ^--   volatileDsMemPlc region [type=user, persistence=false, lazyAlloc=true,
[12:52:29]       ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=0MB]
[12:52:30] Security status [authentication=off, sandbox=off, tls/ssl=off]
[12:52:30] Performance suggestions for grid 'node2' (fix if possible)
[12:52:30] To disable, set -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true
[12:52:30]   ^-- Specify JVM heap max size (add '-Xmx<size>[g|G|m|M|k|K]' to JVM options)
[12:52:30]   ^-- Set max direct memory size if getting 'OOME: Direct buffer memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM options)
[12:52:30]   ^-- Disable assertions (remove '-ea' from JVM options)
[12:52:30] Refer to this page for more performance suggestions: https://ignite.apache.org/docs/latest/perf-and-troubleshooting/memory-tuning
[12:52:30] 
[12:52:30] 
[12:52:30] Ignite node started OK (id=00dc7f4a, instance name=node2)
[12:52:30] Topology snapshot [ver=2, locNode=00dc7f4a, servers=2, clients=0, state=ACTIVE, CPUs=8, offheap=4.8GB, heap=6.0GB]
[12:52:30]   ^-- Baseline [id=0, size=2, online=2, offline=0]
[12:52:31] Joining node doesn't have stored group keys [node=0ddda712-bb41-4814-9124-86e010ce60c9]
[12:52:31] Topology snapshot [ver=3, locNode=00dc7f4a, servers=3, clients=0, state=ACTIVE, CPUs=8, offheap=9.6GB, heap=12.0GB]
[12:52:31]   ^-- Baseline [id=0, size=3, online=3, offline=0]
[12:52:31] Topology snapshot [ver=3, locNode=a6e3015d, servers=3, clients=0, state=ACTIVE, CPUs=8, offheap=9.6GB, heap=12.0GB]
[12:52:31]   ^-- Baseline [id=0, size=3, online=3, offline=0]
Services deployed on node1: [myService6, myService5, myService3, myService0, myService9]
Services deployed on node2: [myService1]
Now sleeping for 10s.
[12:52:43] Ignite node stopped OK [name=node2, uptime=00:00:12.957]
[12:52:43] Ignite node stopped OK [uptime=00:00:17.336]

Process finished with exit code 0
Output for Node3.java
[12:52:24] (wrn) Default Spring XML file not found (is IGNITE_HOME set?): config/default-config.xml
Sep 03, 2025 12:52:24 PM org.apache.ignite.logger.java.JavaLogger warning
WARNING: Failed to resolve default logging config file: config/java.util.logging.properties
[12:52:25]    __________  ________________ 
[12:52:25]   /  _/ ___/ |/ /  _/_  __/ __/ 
[12:52:25]  _/ // (7 7    // /  / / / _/   
[12:52:25] /___/\___/_/|_/___/ /_/ /x___/  
[12:52:25] 
[12:52:25] ver. 2.17.0#20250209-sha1:d53d4540
[12:52:25] 2025 Copyright(C) Apache Software Foundation
[12:52:25] 
[12:52:25] Ignite documentation: https://ignite.apache.org
[12:52:25] 
[12:52:25] Quiet mode.
[12:52:25]   ^-- Logging by 'JavaLogger [quiet=true, config=null]'
[12:52:25]   ^-- To see **FULL** console log here add -DIGNITE_QUIET=false or "-v" to ignite.{sh|bat}
[12:52:25] 
[12:52:25] OS: Windows 11 10.0 amd64
[12:52:25] VM information: OpenJDK Runtime Environment 11.0.22+7-adhoc..jdk11u OpenLogic OpenJDK 64-Bit Server VM 11.0.22+7-adhoc..jdk11u
[12:52:25] Please set system property '-Djava.net.preferIPv4Stack=true' to avoid possible problems in mixed environments.
[12:52:26] Configured plugins:
[12:52:26]   ^-- None
[12:52:26] 
[12:52:26] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]]
[12:52:26] Initial heap size is 384MB (should be no less than 512MB, use -Xms512m -Xmx512m).
[12:52:26] Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides.
[12:52:31] Nodes started on local machine require more than 80% of physical RAM what can lead to significant slowdown due to swapping (please decrease JVM heap size, data region size or checkpoint buffer size) [required=33367MB, available=24492MB]
[12:52:31] Data Regions Started: 3
[12:52:31] 
[12:52:31]     ^--   sysMemPlc region [type=internal, persistence=false, lazyAlloc=false,
[12:52:31]       ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=40MB]
[12:52:31]     ^--   default region [type=default, persistence=false, lazyAlloc=true,
[12:52:31]       ...  initCfg=256MB, maxCfg=4898MB, usedRam=0MB, freeRam=100%, allocRam=0MB]
[12:52:31]     ^--   volatileDsMemPlc region [type=user, persistence=false, lazyAlloc=true,
[12:52:31]       ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=0MB]
[12:52:32] Security status [authentication=off, sandbox=off, tls/ssl=off]
[12:52:32] Performance suggestions for grid  (fix if possible)
[12:52:32] To disable, set -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true
[12:52:32]   ^-- Specify JVM heap max size (add '-Xmx<size>[g|G|m|M|k|K]' to JVM options)
[12:52:32]   ^-- Set max direct memory size if getting 'OOME: Direct buffer memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM options)
[12:52:32] Refer to this page for more performance suggestions: https://ignite.apache.org/docs/latest/perf-and-troubleshooting/memory-tuning
[12:52:32] 
[12:52:32] 
[12:52:32] Ignite node started OK (id=0ddda712)
[12:52:32] Topology snapshot [ver=3, locNode=0ddda712, servers=3, clients=0, state=ACTIVE, CPUs=8, offheap=9.6GB, heap=12.0GB]
[12:52:32]   ^-- Baseline [id=0, size=3, online=3, offline=0]
Services deployed on node3: [myService8, myService2, myService7, myService4]
Now sleeping for 20s.
[12:52:42] Topology snapshot [ver=4, locNode=0ddda712, servers=2, clients=0, state=ACTIVE, CPUs=8, offheap=9.6GB, heap=12.0GB]
[12:52:42] Coordinator changed [prev=TcpDiscoveryNode [id=a6e3015d-6abd-4bf5-b352-bd291372cae4, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.161.44:47500, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.161.44], sockAddrs=HashSet [/0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500, /192.168.161.44:47500], discPort=47500, order=1, intOrder=1, loc=false, ver=2.17.0#20250209-sha1:d53d4540, isClient=false], cur=TcpDiscoveryNode [id=00dc7f4a-d0e4-4391-bd04-28b9ef934050, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.161.44:47501, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.161.44], sockAddrs=HashSet [/192.168.161.44:47501, /0:0:0:0:0:0:0:1:47501, /127.0.0.1:47501], discPort=47501, order=2, intOrder=2, loc=false, ver=2.17.0#20250209-sha1:d53d4540, isClient=false]]
[12:52:42]   ^-- Baseline [id=0, size=2, online=2, offline=0]
[12:52:45] Topology snapshot [ver=5, locNode=0ddda712, servers=1, clients=0, state=ACTIVE, CPUs=8, offheap=4.8GB, heap=6.0GB]
[12:52:45] Coordinator changed [prev=TcpDiscoveryNode [id=00dc7f4a-d0e4-4391-bd04-28b9ef934050, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.161.44:47501, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.161.44], sockAddrs=HashSet [/192.168.161.44:47501, /0:0:0:0:0:0:0:1:47501, /127.0.0.1:47501], discPort=47501, order=2, intOrder=2, loc=false, ver=2.17.0#20250209-sha1:d53d4540, isClient=false], cur=TcpDiscoveryNode [id=0ddda712-bb41-4814-9124-86e010ce60c9, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.161.44:47502, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.161.44], sockAddrs=HashSet [/0:0:0:0:0:0:0:1:47502, /127.0.0.1:47502, /192.168.161.44:47502], discPort=47502, order=3, intOrder=3, loc=true, ver=2.17.0#20250209-sha1:d53d4540, isClient=false]]
[12:52:45]   ^-- Baseline [id=0, size=1, online=1, offline=0]
Sep 03, 2025 12:52:45 PM org.apache.ignite.logger.java.JavaLogger error
SEVERE: Failed to send message to remote node [node=TcpDiscoveryNode [id=00dc7f4a-d0e4-4391-bd04-28b9ef934050, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.161.44:47501, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.161.44], sockAddrs=HashSet [/192.168.161.44:47501, /0:0:0:0:0:0:0:1:47501, /127.0.0.1:47501], discPort=47501, order=2, intOrder=2, loc=false, ver=2.17.0#20250209-sha1:d53d4540, isClient=false], msg=GridIoMessage [plc=2, topic=TOPIC_EXCHANGE, topicOrd=31, ordered=false, timeout=0, skipOnTimeout=false, msg=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.LatchAckMessage@63e33ef7]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException: Failed to send message (node left topology): TcpDiscoveryNode [id=00dc7f4a-d0e4-4391-bd04-28b9ef934050, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.161.44:47501, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.161.44], sockAddrs=HashSet [/192.168.161.44:47501, /0:0:0:0:0:0:0:1:47501, /127.0.0.1:47501], discPort=47501, order=2, intOrder=2, loc=false, ver=2.17.0#20250209-sha1:d53d4540, isClient=false]
	at org.apache.ignite.spi.communication.tcp.internal.GridNioServerWrapper.createNioSession(GridNioServerWrapper.java:416)
	at org.apache.ignite.spi.communication.tcp.internal.GridNioServerWrapper.createTcpClient(GridNioServerWrapper.java:692)
	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:1181)
	at org.apache.ignite.spi.communication.tcp.internal.GridNioServerWrapper.createTcpClient(GridNioServerWrapper.java:690)
	at org.apache.ignite.spi.communication.tcp.internal.ConnectionClientPool.createCommunicationClient(ConnectionClientPool.java:442)
	at org.apache.ignite.spi.communication.tcp.internal.ConnectionClientPool.reserveClient(ConnectionClientPool.java:231)
	at org.apache.ignite.spi.communication.tcp.internal.CommunicationWorker.processDisconnect(CommunicationWorker.java:376)
	at org.apache.ignite.spi.communication.tcp.internal.CommunicationWorker.body(CommunicationWorker.java:174)
	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$3.body(TcpCommunicationSpi.java:848)
	at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)

Sep 03, 2025 12:52:45 PM org.apache.ignite.logger.java.JavaLogger error
SEVERE: Failed to send message to remote node [node=TcpDiscoveryNode [id=00dc7f4a-d0e4-4391-bd04-28b9ef934050, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.161.44:47501, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.161.44], sockAddrs=HashSet [/192.168.161.44:47501, /0:0:0:0:0:0:0:1:47501, /127.0.0.1:47501], discPort=47501, order=2, intOrder=2, loc=false, ver=2.17.0#20250209-sha1:d53d4540, isClient=false], msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtPartitionsSingleMessage [parts=HashMap {-2100569601=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=3, minorTopVer=2], updateSeq=110, size=100], -1365813811=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=3, minorTopVer=2], updateSeq=4, size=713]}, partCntrs=HashMap {-2100569601=CachePartitionPartialCountersMap {}, -1365813811=CachePartitionPartialCountersMap {656=(0,2)}}, partsSizes=HashMap {-1365813811=HashMap {656=1}}, partHistCntrs=null, err=null, client=false, exchangeStartTime=1756884163004, finishMsg=null, super=GridDhtPartitionsAbstractMessage [exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=a6e3015d-6abd-4bf5-b352-bd291372cae4, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.161.44:47500, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.161.44], sockAddrs=HashSet [/0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500, /192.168.161.44:47500], discPort=47500, order=1, intOrder=1, loc=false, ver=2.17.0#20250209-sha1:d53d4540, isClient=false], topVer=4, msgTemplate=null, span=org.apache.ignite.internal.processors.tracing.NoopSpan@b7c450d, nodeId8=0ddda712, msg=Node left, type=NODE_LEFT, tstamp=1756884162968], nodeId=a6e3015d, evt=NODE_LEFT], lastVer=GridCacheVersion [topVer=368364147, order=1756884147363, nodeOrder=3, dataCenterId=0], super=GridCacheMessage [msgId=12, depInfo=null, lastAffChangedTopVer=AffinityTopologyVersion [topVer=-1, minorTopVer=0], err=null, skipPrepare=false]]]]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException: Failed to send message (node left topology): TcpDiscoveryNode [id=00dc7f4a-d0e4-4391-bd04-28b9ef934050, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.161.44:47501, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.161.44], sockAddrs=HashSet [/192.168.161.44:47501, /0:0:0:0:0:0:0:1:47501, /127.0.0.1:47501], discPort=47501, order=2, intOrder=2, loc=false, ver=2.17.0#20250209-sha1:d53d4540, isClient=false]
	at org.apache.ignite.spi.communication.tcp.internal.GridNioServerWrapper.createNioSession(GridNioServerWrapper.java:416)
	at org.apache.ignite.spi.communication.tcp.internal.GridNioServerWrapper.createTcpClient(GridNioServerWrapper.java:692)
	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:1181)
	at org.apache.ignite.spi.communication.tcp.internal.GridNioServerWrapper.createTcpClient(GridNioServerWrapper.java:690)
	at org.apache.ignite.spi.communication.tcp.internal.ConnectionClientPool.createCommunicationClient(ConnectionClientPool.java:442)
	at org.apache.ignite.spi.communication.tcp.internal.ConnectionClientPool.reserveClient(ConnectionClientPool.java:231)
	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1105)
	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1052)
	at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2059)
	at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2152)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1209)
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.sendLocalPartitions(GridDhtPartitionsExchangeFuture.java:2166)
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.sendPartitions(GridDhtPartitionsExchangeFuture.java:2302)
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1785)
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:1055)
	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3321)
	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3155)
	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
	at java.base/java.lang.Thread.run(Thread.java:829)

Services deployed on node3: [myService8, myService1, myService2, myService7, myService4] (5 total)
[12:53:32] 
[12:53:32]     ^--   sysMemPlc region [type=internal, persistence=false, lazyAlloc=false,
[12:53:32]       ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.21%, allocRam=40MB]
[12:53:32]     ^--   default region [type=default, persistence=false, lazyAlloc=true,
[12:53:32]       ...  initCfg=256MB, maxCfg=4898MB, usedRam=8MB, freeRam=99.84%, allocRam=256MB]
[12:53:32]     ^--   volatileDsMemPlc region [type=user, persistence=false, lazyAlloc=true,
[12:53:32]       ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=0MB]
[12:53:32] 
[12:53:32] 
[12:53:32] Data storage metrics for local node (to disable set 'metricsLogFrequency' to 0)
[12:53:32]     ^-- Off-heap memory [used=8MB, free=99.83%, allocated=296MB]
[12:53:32]     ^-- Page memory [pages=2250]
[12:53:42] Ignite node stopped OK [uptime=00:01:09.829]

Process finished with exit code 130

Reproduced with Ignite 2.17.0, OpenLogic OpenJDK 11.0.22.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions