Skip to content

The "Found an invalid row for peer" error is obscure #303

Closed
@bhalevy

Description

@bhalevy

I'm seeing these error in a dtest that adds 2 new nodes (in parallel) to a 3-nodes cluster.

08:56:37,665 1736154 ccm                            DEBUG    cluster.py          :762  | test_tablets_add_two_nodes_in_parallel: node4: Starting scylla: args=['/home/bhalevy/.dtest/dtest-u7u3ay78/test/node4/bin/scylla', '--options-file', '/home/bhalevy/.dtest/dtest-u7u3ay78/test/no
de4/conf/scylla.yaml', '--log-to-stdout', '1', '--api-address', '127.0.73.4', '--smp', '2', '--memory', '1024M', '--developer-mode', 'true', '--default-log-level', 'info', '--overprovisioned', '--prometheus-address', '127.0.73.4', '--unsafe-bypass-fsync', '1', '--kernel-page-cache'
, '1', '--commitlog-use-o-dsync', '0', '--max-networking-io-control-blocks', '1000'] wait_other_notice=False wait_for_binary_proto=False
08:56:37,667 1736154 ccm                            DEBUG    cluster.py          :762  | test_tablets_add_two_nodes_in_parallel: node4: Starting scylla-jmx: args=['/home/bhalevy/.dtest/dtest-u7u3ay78/test/node4/bin/symlinks/scylla-jmx', '-Dapiaddress=127.0.73.4', '-Djavax.managemen
t.builder.initial=com.scylladb.jmx.utils.APIBuilder', '-Djava.rmi.server.hostname=127.0.73.4', '-Dcom.sun.management.jmxremote', '-Dcom.sun.management.jmxremote.host=127.0.73.4', '-Dcom.sun.management.jmxremote.port=7199', '-Dcom.sun.management.jmxremote.rmi.port=7199', '-Dcom.sun.
management.jmxremote.local.only=false', '-Xmx256m', '-XX:+UseSerialGC', '-Dcom.sun.management.jmxremote.authenticate=false', '-Dcom.sun.management.jmxremote.ssl=false', '-jar', '/home/bhalevy/.dtest/dtest-u7u3ay78/test/node4/bin/scylla-jmx-1.0.jar']
08:56:37,972 1736154 ccm                            DEBUG    cluster.py          :762  | test_tablets_add_two_nodes_in_parallel: node5: Starting scylla: args=['/home/bhalevy/.dtest/dtest-u7u3ay78/test/node5/bin/scylla', '--options-file', '/home/bhalevy/.dtest/dtest-u7u3ay78/test/no
de5/conf/scylla.yaml', '--log-to-stdout', '1', '--api-address', '127.0.73.5', '--smp', '2', '--memory', '1024M', '--developer-mode', 'true', '--default-log-level', 'info', '--overprovisioned', '--prometheus-address', '127.0.73.5', '--unsafe-bypass-fsync', '1', '--kernel-page-cache'
, '1', '--commitlog-use-o-dsync', '0', '--max-networking-io-control-blocks', '1000'] wait_other_notice=False wait_for_binary_proto=False
08:56:37,973 1736154 ccm                            DEBUG    cluster.py          :762  | test_tablets_add_two_nodes_in_parallel: node5: Starting scylla-jmx: args=['/home/bhalevy/.dtest/dtest-u7u3ay78/test/node5/bin/symlinks/scylla-jmx', '-Dapiaddress=127.0.73.5', '-Djavax.managemen
t.builder.initial=com.scylladb.jmx.utils.APIBuilder', '-Djava.rmi.server.hostname=127.0.73.5', '-Dcom.sun.management.jmxremote', '-Dcom.sun.management.jmxremote.host=127.0.73.5', '-Dcom.sun.management.jmxremote.port=7199', '-Dcom.sun.management.jmxremote.rmi.port=7199', '-Dcom.sun.
management.jmxremote.local.only=false', '-Xmx256m', '-XX:+UseSerialGC', '-Dcom.sun.management.jmxremote.authenticate=false', '-Dcom.sun.management.jmxremote.ssl=false', '-jar', '/home/bhalevy/.dtest/dtest-u7u3ay78/test/node5/bin/scylla-jmx-1.0.jar']
08:56:43,620 1736154 cassandra.cluster              WARNING  thread.py           :58   | test_tablets_add_two_nodes_in_parallel: Found an invalid row for peer (127.0.73.5). Ignoring host.
08:56:43,620 1736154 cassandra.cluster              INFO     thread.py           :58   | test_tablets_add_two_nodes_in_parallel: New Cassandra host <Host: 127.0.73.4:9042 datacenter1> discovered
08:56:45,521 1736154 cassandra.cluster              WARNING  thread.py           :58   | test_tablets_add_two_nodes_in_parallel: Found an invalid row for peer (127.0.73.5). Ignoring host.
08:56:46,585 1736154 update_cluster_layout_tests    DEBUG    update_cluster_layout_tests.py:436  | test_tablets_add_two_nodes_in_parallel: Check that nodes started successfully
08:56:46,586 1736154 update_cluster_layout_tests    DEBUG    update_cluster_layout_tests.py:441  | test_tablets_add_two_nodes_in_parallel: Verify tablet load-balancing after 10 seconds
08:56:48,206 1736154 cassandra.cluster              INFO     thread.py           :58   | test_tablets_add_two_nodes_in_parallel: New Cassandra host <Host: 127.0.73.5:9042 datacenter1> discovered
08:56:56,590 1736154 cassandra.cluster              INFO     dtest_setup.py      :493  | test_tablets_add_two_nodes_in_parallel: New Cassandra host <Host: 127.0.73.5:9042 datacenter1> discovered
08:56:56,591 1736154 cassandra.cluster              INFO     dtest_setup.py      :493  | test_tablets_add_two_nodes_in_parallel: New Cassandra host <Host: 127.0.73.3:9042 datacenter1> discovered
08:56:56,591 1736154 cassandra.cluster              INFO     dtest_setup.py      :493  | test_tablets_add_two_nodes_in_parallel: New Cassandra host <Host: 127.0.73.2:9042 datacenter1> discovered
08:56:56,591 1736154 cassandra.cluster              INFO     dtest_setup.py      :493  | test_tablets_add_two_nodes_in_parallel: New Cassandra host <Host: 127.0.73.4:9042 datacenter1> discovered

It looks like the error heals itself eventually but it's unclear why the the row in system.peers was considered as invalid, and on which node it was read from.
It would help if we printed more information to facilitate debugging in case something is wrong in the scylla server side.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions