Skip to content

bank-lua fails with 1.8% probability by construction #94

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Totktonada opened this issue Oct 30, 2021 · 0 comments · Fixed by #99
Closed

bank-lua fails with 1.8% probability by construction #94

Totktonada opened this issue Oct 30, 2021 · 0 comments · Fixed by #99
Assignees
Labels
bug Something isn't working

Comments

@Totktonada
Copy link
Member

Totktonada commented Oct 30, 2021

The test creates 10 bank accounts and set initial balances:

$ grep :read tarantool-3f568c339221ac10948c0dd29e405c0f224597dc/20211029T163440.000Z/jepsen.log | grep -v :invoke | head -n 1
2021-10-29 16:44:43,402{GMT}	INFO	[jepsen worker 0] jepsen.util: 0	:ok	:read	{0 100, 1 0, 2 0, 3 0, 4 0, 5 0, 6 0, 7 0, 8 0, 9 0}

The 0th account has balance 100, others have 0.

Next, the test starts to perform transfers from account to account:

$ grep :transfer tarantool-3f568c339221ac10948c0dd29e405c0f224597dc/20211029T163440.000Z/jepsen.log | grep -v :invoke | head -n 10
2021-10-29 16:44:44,169{GMT}	INFO	[jepsen worker 0] jepsen.util: 0	:fail	:transfer	{:from 4, :to 2, :amount 5}
2021-10-29 16:44:47,393{GMT}	INFO	[jepsen worker 0] jepsen.util: 0	:fail	:transfer	{:from 9, :to 4, :amount 2}
2021-10-29 16:44:48,177{GMT}	INFO	[jepsen worker 0] jepsen.util: 0	:fail	:transfer	{:from 2, :to 0, :amount 2}
2021-10-29 16:44:49,756{GMT}	INFO	[jepsen worker 0] jepsen.util: 0	:fail	:transfer	{:from 7, :to 6, :amount 5}
2021-10-29 16:44:51,306{GMT}	INFO	[jepsen worker 0] jepsen.util: 0	:fail	:transfer	{:from 4, :to 9, :amount 5}
2021-10-29 16:44:53,736{GMT}	INFO	[jepsen worker 0] jepsen.util: 0	:fail	:transfer	{:from 7, :to 0, :amount 4}
2021-10-29 16:44:54,518{GMT}	INFO	[jepsen worker 0] jepsen.util: 0	:fail	:transfer	{:from 1, :to 2, :amount 5}
2021-10-29 16:44:55,311{GMT}	INFO	[jepsen worker 0] jepsen.util: 0	:fail	:transfer	{:from 2, :to 7, :amount 4}
2021-10-29 16:44:56,089{GMT}	INFO	[jepsen worker 0] jepsen.util: 0	:fail	:transfer	{:from 2, :to 1, :amount 5}
2021-10-29 16:44:57,632{GMT}	INFO	[jepsen worker 0] jepsen.util: 0	:fail	:transfer	{:from 9, :to 3, :amount 4}

When 'from' account has not enough balance, the transfer fails (it does not fail the whole test). 'from'/'to' accounts are choosen randomly. When all transfer fail, the test fails (see here).

The test is time bounded: 60 seconds. We have no concurrency in the test and amount of transfers is around 38. There is (9/10)^38 ≈ 1.8% probability that we'll never attempt to transfer from 0th account, so all transfers will fail.

The result of the test is reported this way in given case:

2021-10-29 16:46:06,559{GMT}	INFO	[jepsen test runner] jepsen.core: {:perf
 {:latency-graph {:valid? true},
  :rate-graph {:valid? true},
  :valid? true},
 :clock-skew {:valid? true},
 :crash {:valid? true},
 :timeline {:valid? true},
 :stats
 {:valid? false,
  :count 77,
  :ok-count 39,
  :fail-count 38,
  :info-count 0,
  :by-f
  {:read
   {:valid? true,
    :count 39,
    :ok-count 39,
    :fail-count 0,
    :info-count 0},
   :transfer
   {:valid? false,
    :count 38,
    :ok-count 0,
    :fail-count 38,
    :info-count 0}}},
 :exceptions {:valid? true},
 :workload
 {:SI
  {:valid? true,
   :read-count 39,
   :error-count 0,
   :first-error nil,
   :errors {}},
  :plot {:valid? true},
  :valid? true},
 :valid? false}


Analysis invalid! (ノಥ益ಥ)ノ ┻━┻

(Not quite obvious on the first glance, but look at :ok-count 0.)

There are some ways to resolve it or decrease probability of the fail:

  • Distribute the initial total amount across all accounts.
  • Enable concurrency.
  • Always make the first transfer from 0th account.
  • Allow negative balances.
  • A fix for Don't create a new connection for each request #96 may also decrease probability of the fail due to insreased overall amount of transfer attempts per second.

Uniform initial distribution or permitting negative balances are only true solutions, they also resolve the problem with a lot of failed transfers. See below.

I looked over several other tests implementations:

  • CockroachDB distributed balance uniformly (see here and here).
  • Ignite distributes them too (see here, here and here).
  • YugabyteDB sets the first account to an initial value and others to zero (see here and here).

So, it seems, there is no canonical way to choose initial balances. OTOH, if we'll distribute them uniformly, we'll make more successful transfers, so the test will be more powerful.

Full log: logs.txt.
More detailed info: jepsen-single-instance.zip (see tarantool-3f568c339221ac10948c0dd29e405c0f224597dc/20211029T163440.000Z/).


I would also highlight that even when the test in its current implementation does not fails, it generates much less successful transactions, if we'll compare it with the uniform balances distribution case or negative balances case. So our test lacks some stress testing power just because of the bad initial balances distribution.

@Totktonada Totktonada added the bug Something isn't working label Nov 14, 2021
@Totktonada Totktonada self-assigned this Nov 14, 2021
Totktonada added a commit that referenced this issue Nov 14, 2021
Before this commit the initial distribution was the following: first
account has `total-amount` balances, other 9 ones have zero balance.

From time to time is appears that there are no attempts to transfer from
the first account. It means that all transfers are marked as failed and
the history analysis reports fail.

Even when there are some successful transfers, there are many attempts
to transfer from an account with zero balance: those transfers are
useless in terms of tarantool testing.

This commit changes the initial distribution to the uniform one and
resolves problems described above. See the linked issue for details.

Aside of the problematic bank-lua I changed other banking tests in the
same way to unify the approach. How each test is changed:

* bank-lua: [100 0 0 0 0 0 0 0 0 0] -> [10 10 10 10 10 10 10 10 10 10].
* bank-multitable-lua: the test already distributes initial balances
  uniformly, but hardcodes balance value. Now it is deduced from the
  `total-amount` parameter.
* bank: same as bank-lua.
* bank-multitable: same as bank-lua.

Fixes #94
Totktonada added a commit that referenced this issue Nov 14, 2021
Before this commit the initial distribution was the following: first
account has `total-amount` balances, other 9 ones have zero balance.

From time to time is appears that there are no attempts to transfer from
the first account. It means that all transfers are marked as failed and
the history analysis reports fail.

Even when there are some successful transfers, there are many attempts
to transfer from an account with zero balance: those transfers are
useless in terms of tarantool testing.

This commit changes the initial distribution to the uniform one and
resolves problems described above. See the linked issue for details.

Aside of the problematic bank-lua I changed other banking tests in the
same way to unify the approach. How each test is changed:

* bank-lua: [100 0 0 0 0 0 0 0 0 0] -> [10 10 10 10 10 10 10 10 10 10].
* bank-multitable-lua: the test already distributes initial balances
  uniformly, but hardcodes balance value. Now it is deduced from the
  `total-amount` parameter.
* bank: same as bank-lua.
* bank-multitable: same as bank-lua.

Fixes #94
Totktonada added a commit that referenced this issue Nov 14, 2021
Before this commit the initial distribution was the following: first
account has `total-amount` balances, other 9 ones have zero balance.

From time to time is appears that there are no attempts to transfer from
the first account. It means that all transfers are marked as failed and
the history analysis reports fail.

Even when there are some successful transfers, there are many attempts
to transfer from an account with zero balance: those transfers are
useless in terms of tarantool testing.

This commit changes the initial distribution to the uniform one and
resolves problems described above. See the linked issue for details.

Aside of the problematic bank-lua I changed other banking tests in the
same way to unify the approach. How each test is changed:

* bank-lua: [100 0 0 0 0 0 0 0 0 0] -> [10 10 10 10 10 10 10 10 10 10].
* bank-multitable-lua: the test already distributes initial balances
  uniformly, but hardcodes balance value. Now it is deduced from the
  `total-amount` parameter.
* bank: same as bank-lua.
* bank-multitable: same as bank-lua.

Fixes #94
Totktonada added a commit that referenced this issue Nov 14, 2021
Before this commit the initial distribution was the following: first
account has `total-amount` balances, other 9 ones have zero balance.

From time to time is appears that there are no attempts to transfer from
the first account. It means that all transfers are marked as failed and
the history analysis reports fail.

Even when there are some successful transfers, there are many attempts
to transfer from an account with zero balance: those transfers are
useless in terms of tarantool testing.

This commit changes the initial distribution to the uniform one and
resolves the problems, which are described above. See the linked issue
for details.

Aside of the problematic bank-lua I changed other banking tests in the
same way to unify the approach. How each test is changed:

* bank-lua: [100 0 0 0 0 0 0 0 0 0] -> [10 10 10 10 10 10 10 10 10 10].
* bank-multitable-lua: the test already distributes initial balances
  uniformly, but hardcodes balance value. Now it is deduced from the
  `total-amount` parameter.
* bank: same as bank-lua.
* bank-multitable: same as bank-lua.

'bank' and 'bank-multitable' fixes are blind, because they're disabled.
See #83.

'bank-lua' and 'bank-multitable-lua' are passed in a manual run.
'bank-lua' gives 2-3 failed transfers of ~40 ones. Before it was like
5-15 *successful* ones of 30-40. (I have no representative statistics,
just looked over several runs in CI, but the numbers looks expected.)

Fixes #94
Totktonada added a commit that referenced this issue Nov 20, 2021
Before this commit the initial distribution was the following: first
account has `total-amount` balances, other 9 ones have zero balance.

From time to time is appears that there are no attempts to transfer from
the first account. It means that all transfers are marked as failed and
the history analysis reports fail.

Even when there are some successful transfers, there are many attempts
to transfer from an account with zero balance: those transfers are
useless in terms of tarantool testing.

This commit changes the initial distribution to the uniform one and
resolves the problems, which are described above. See the linked issue
for details.

Aside of the problematic bank-lua I changed other banking tests in the
same way to unify the approach. How each test is changed:

* bank-lua: [100 0 0 0 0 0 0 0 0 0] -> [10 10 10 10 10 10 10 10 10 10].
* bank-multitable-lua: the test already distributes initial balances
  uniformly, but hardcodes balance value. Now it is deduced from the
  `total-amount` parameter.
* bank: same as bank-lua.
* bank-multitable: same as bank-lua.

'bank' and 'bank-multitable' fixes are blind, because they're disabled.
See #83.

'bank-lua' and 'bank-multitable-lua' are passed in a manual run.
'bank-lua' gives 2-3 failed transfers of ~40 ones. Before it was like
5-15 *successful* ones of 30-40. (I have no representative statistics,
just looked over several runs in CI, but the numbers looks expected.)

Fixes #94
Totktonada added a commit that referenced this issue Nov 23, 2021
Before this commit the initial distribution was the following: first
account has `total-amount` balances, other 9 ones have zero balance.

From time to time is appears that there are no attempts to transfer from
the first account. It means that all transfers are marked as failed and
the history analysis reports fail.

Even when there are some successful transfers, there are many attempts
to transfer from an account with zero balance: those transfers are
useless in terms of tarantool testing.

This commit changes the initial distribution to the uniform one and
resolves the problems, which are described above. See the linked issue
for details.

Aside of the problematic bank-lua I changed other banking tests in the
same way to unify the approach. How each test is changed:

* bank-lua: [100 0 0 0 0 0 0 0 0 0] -> [10 10 10 10 10 10 10 10 10 10].
* bank-multitable-lua: the test already distributes initial balances
  uniformly, but hardcodes balance value. Now it is deduced from the
  `total-amount` parameter.
* bank: same as bank-lua.
* bank-multitable: same as bank-lua.

'bank' and 'bank-multitable' fixes are blind, because they're disabled.
See #83.

'bank-lua' and 'bank-multitable-lua' are passed in a manual run.
'bank-lua' gives 2-3 failed transfers of ~40 ones. Before it was like
5-15 *successful* ones of 30-40. (I have no representative statistics,
just looked over several runs in CI, but the numbers looks expected.)

Fixes #94
Totktonada added a commit that referenced this issue Nov 23, 2021
Before this commit the initial distribution was the following: first
account has `total-amount` balances, other 9 ones have zero balance.

From time to time is appears that there are no attempts to transfer from
the first account. It means that all transfers are marked as failed and
the history analysis reports fail.

Even when there are some successful transfers, there are many attempts
to transfer from an account with zero balance: those transfers are
useless in terms of tarantool testing.

This commit changes the initial distribution to the uniform one and
resolves the problems, which are described above. See the linked issue
for details.

Aside of the problematic bank-lua I changed other banking tests in the
same way to unify the approach. How each test is changed:

* bank-lua: [100 0 0 0 0 0 0 0 0 0] -> [10 10 10 10 10 10 10 10 10 10].
* bank-multitable-lua: the test already distributes initial balances
  uniformly, but hardcodes balance value. Now it is deduced from the
  `total-amount` parameter.
* bank: same as bank-lua.
* bank-multitable: same as bank-lua.

'bank' and 'bank-multitable' fixes are blind, because they're disabled.
See #83.

'bank-lua' and 'bank-multitable-lua' are passed in a manual run.
'bank-lua' gives 2-3 failed transfers of ~40 ones. Before it was like
5-15 *successful* ones of 30-40. (I have no representative statistics,
just looked over several runs in CI, but the numbers looks expected.)

Fixes #94
ligurio pushed a commit that referenced this issue Nov 24, 2021
Before this commit the initial distribution was the following: first
account has `total-amount` balances, other 9 ones have zero balance.

From time to time is appears that there are no attempts to transfer from
the first account. It means that all transfers are marked as failed and
the history analysis reports fail.

Even when there are some successful transfers, there are many attempts
to transfer from an account with zero balance: those transfers are
useless in terms of tarantool testing.

This commit changes the initial distribution to the uniform one and
resolves the problems, which are described above. See the linked issue
for details.

Aside of the problematic bank-lua I changed other banking tests in the
same way to unify the approach. How each test is changed:

* bank-lua: [100 0 0 0 0 0 0 0 0 0] -> [10 10 10 10 10 10 10 10 10 10].
* bank-multitable-lua: the test already distributes initial balances
  uniformly, but hardcodes balance value. Now it is deduced from the
  `total-amount` parameter.
* bank: same as bank-lua.
* bank-multitable: same as bank-lua.

'bank' and 'bank-multitable' fixes are blind, because they're disabled.
See #83.

'bank-lua' and 'bank-multitable-lua' are passed in a manual run.
'bank-lua' gives 2-3 failed transfers of ~40 ones. Before it was like
5-15 *successful* ones of 30-40. (I have no representative statistics,
just looked over several runs in CI, but the numbers looks expected.)

Fixes #94
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant