Skip to content

Add threadStatus to IOSim and IOSimPOR #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

MaximilianAlgehed
Copy link
Contributor

@MaximilianAlgehed MaximilianAlgehed commented Aug 31, 2022

This PR introduces an implementation of threadStatus for IOSim. It's currently a bit fast and loose on the "blocked reason" for threads (e.g. reading MVars etc.) but should implement the rest of the functionality faithfully.

This fixes #12.

@MaximilianAlgehed MaximilianAlgehed force-pushed the PR-thread-status branch 3 times, most recently from 8a2ef39 to e8b0ed3 Compare August 31, 2022 09:02
@MaximilianAlgehed
Copy link
Contributor Author

@coot do you perhaps have time to review this?

@coot coot self-requested a review September 1, 2022 13:52
Copy link
Collaborator

@coot coot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaximilianAlgehed thanks that looks good. I see one problem with this implementation: it does not work well with our implementation of MVar's which is based on TVar's. When a thread is blocked on an MVar we will thus report ThreadBlocked BlockedOnSTM. But it shouldn't be difficult to distinguished MVars and TVars. We's need to add a boolean flag to TVar: tvarIsMVar :: Bool. The problem to solve is how to update it when we create an MVar in newEmptyMVarDefault. The simplest way would be to move that function to io-sim, where we have access to TVar internals.

@MaximilianAlgehed
Copy link
Contributor Author

@coot do you think we should try to address the BlockReason issues in this PR or do we postpone that for later?

@coot
Copy link
Collaborator

coot commented Sep 3, 2022

BlockReason could be addressed later; Please leave a comment which links to an issue.

@MaximilianAlgehed
Copy link
Contributor Author

Issue mentioning BlockReason #21.

@MaximilianAlgehed
Copy link
Contributor Author

@coot I think this should be about ready now.

Copy link
Contributor

@dcoutts dcoutts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ok to me.

I have to say I was not aware of the semantics of threadStatus. It looks like you believe that the difference between status died and finished is receiving an async exception.

The library docs merely say

ThreadFinished | the thread has finished
ThreadDied | the thread received an uncaught exception

without saying anything about async vs sync exceptions.

My own experiment is even less helpful, it suggests that ThreadDied isn't used at all!

import GHC.Conc
import Control.Exception

main = do
  --setUncaughtExceptionHandler throwIO

  tid <- forkIO $ do
           print "foo"
           threadDelay 10000000 --10sec
           print "bar"

  print =<< threadStatus tid
  throwTo tid (ErrorCall "oh noes")
  threadDelay 1000000
  print =<< threadStatus tid

and

 ghc-8.10.7 ThreadStatus.hs 
[1 of 1] Compiling Main             ( ThreadStatus.hs, ThreadStatus.o )
Linking ThreadStatus ...
[duncan@dunky io-sim]$ ./ThreadStatus 
ThreadRunning
ThreadStatus: oh noes
ThreadFinished

I thought it might be related to the default top-level exception handler for forkIO threads, but apparently not. If one uncomments the line setUncaughtExceptionHandler throwIO then the final result is instead

$ ./ThreadStatus 
ThreadRunning
ThreadRunning

I cannot get ghc to ever report ThreadDied.

If we don't need to track async vs sync exceptions for the cause here, then it'd simplify the patch quite a bit. If it were just threads terminating with an exception (any, sync or async) vs terminating via return then we'd not have to track so much in this patch.

@MaximilianAlgehed
Copy link
Contributor Author

MaximilianAlgehed commented Sep 16, 2022

@dcoutts did you check the tests? On my machine the tests pass - indicating that our hypothesis about the difference between synch and asynch exceptions is true at least some of the time. See specifically the tests
prop_thread_status_died and prop_thread_status_died_own (which has a bad name I admit). Could you check out the tests and run them on your machine to see if they match?

@MaximilianAlgehed
Copy link
Contributor Author

And to be clear, we were very surprised by the semantics we observed!

@MaximilianAlgehed
Copy link
Contributor Author

MaximilianAlgehed commented Sep 20, 2022

@coot or @dcoutts do you mind running the CI here so we can get things to turn green? (I like green, it's a pretty colour)

Copy link
Collaborator

@coot coot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@coot
Copy link
Collaborator

coot commented Sep 22, 2022

I don't see how to make CI run, there's no box which github docs mentions 😞.

@MaximilianAlgehed
Copy link
Contributor Author

@dcoutts could you perhaps push the right buttons on this so we can get it merged?

@coot
Copy link
Collaborator

coot commented Sep 28, 2022

@dcoutts could you perhaps push the right buttons on this so we can get it merged?

I asked similar question today.

Could you rebase the branch on top of origin/main: even if we run the workflows now they will fail without rebase (I checked that locally).

@coot coot mentioned this pull request Sep 28, 2022
@coot
Copy link
Collaborator

coot commented Sep 28, 2022

@MaximilianAlgehed I pushed PR #27 with your changes (rebased), unfortunately two tests failed:

thread status died_own (IO):             FAIL
 *** Failed! Falsified (after 1 test):
 ThreadFinished /= ThreadBlocked BlockedOnMVar
 Use --quickcheck-replay=392463 to reproduce.
 Use -p '/thread status died_own (IO)/' to rerun this test only.

thread status mask (IO):                 FAIL
 *** Failed! Falsified (after 1 test):
 ThreadFinished /= ThreadBlocked BlockedOnMVar
 Use --quickcheck-replay=499493 to reproduce.
 Use -p '/thread status mask (IO)/' to rerun this test only.

@MaximilianAlgehed
Copy link
Contributor Author

@coot that's a bit worrying - they pass locally. Are the tests run in parallel in CI? That might affect scheduling (these tests are understandably a bit sensitive to scheduling).

@coot
Copy link
Collaborator

coot commented Sep 29, 2022

The problem is only on Windows, where I can reproduce it locally too.

@coot
Copy link
Collaborator

coot commented Sep 29, 2022

Since threadStatus is not well specified, I think it's ok if we just disable these tests on Windows.

@MaximilianAlgehed
Copy link
Contributor Author

@coot That's fine with me. I'll make an issue that mentions this so I have something to come back to after the next spate of deadlines.

@MaximilianAlgehed
Copy link
Contributor Author

@coot I've made an issue here #28 and it's weirder than I thought! I finally got the time to look carefully at this and as I say in the issue the behaviour on Windows is probably just plain wrong! We don't do any operations that should put the thread that we ask threadStatus of in that state at any point in the execution!

@coot
Copy link
Collaborator

coot commented Sep 29, 2022

Thanks, I disabled the tests in the other PR. Once CI will finish I'll merge it.

@coot
Copy link
Collaborator

coot commented Sep 29, 2022

Merged via #27.

@coot coot closed this Sep 29, 2022
@dcoutts
Copy link
Contributor

dcoutts commented Oct 11, 2022

@MaximilianAlgehed I'm still going to claim that the threadStatus behaviour is not well defined, and I'm sceptical that we should go to great efforts to support it. I don't mind supporting it, but it's not at all clear that there is any sync vs async exception behaviour difference, and it's the tracking of that distinction that makes this patch particularly invasive. Without that, all the info would be available already I think.

So yes I did try running your two tests prop_thread_status_died and prop_thread_status_died_own in IO and yes they behave as you say. However, if one make a trivial modification:

prop_two_threads_expect target main prop = do
  tid <- forkIO target
  yield                    --  <-- insert this extra yield here
  main tid
  status <- threadStatus tid
  return $ prop status

we would expect that this should make no difference to the semantics right?

Yet it does:

*Test.IOSim> quickCheck $ ioProperty prop_thread_status_died_own 
<interactive>: divide by zero
+++ OK, passed 1 test.
*Test.IOSim> quickCheck $ ioProperty prop_thread_status_died
<interactive>: divide by zero
*** Failed! Falsified (after 1 test):  
ThreadDied /= ThreadFinished

Now both examples return ThreadFinished whereas before the throwTo case would return ThreadDied.

This can also be seen in an even simpler standalone test:

main = do

    tid <- forkIO (forever yield)
    throwTo tid DivideByZero
    yield
    print =<< threadStatus tid

    tid' <- forkIO (forever yield)
    throwTo tid' DivideByZero
    yield
    print =<< threadStatus tid'

with the result

$ ./ThreadStatus
ThreadStatus: divide by zero
ThreadFinished
ThreadDied

This example makes it somewhat clearer what is going on: ghc seems to behave differently when sending an async exception if the thread has never yet run, vs when it has already started running. The example above of running the same thing twice relies on the vagaries of scheduling, whereas inserting an explicit yield forces it one way or another.

So overall, it's not clear that (at least when the victim thread is already running) that ghc makes any distinction between async and sync exceptions. It's just that the async case lets one distinguish between victim threads that have not yet started running, and those that are already running, which for some reason ghc seems to report differently.

So I'd argue that we should back out the complexities of the tracking of sync vs async exceptions for the purpose of thread status reporting, and just do a simpler impl of threadStatus where we never use threadDied and all terminated threads (irrespective of whether they terminated by exception or return) report ThreadFinished.

@dcoutts dcoutts reopened this Oct 11, 2022
@MaximilianAlgehed
Copy link
Contributor Author

@dcoutts I agree with your assessment. However, I don't know when I can get to this but I will try to make it my top priority to clean this patch up next week. Does that sound reasonable?

@dcoutts
Copy link
Contributor

dcoutts commented Oct 11, 2022

I've filed a GHC issue: https://gitlab.haskell.org/ghc/ghc/-/issues/22279

@coot
Copy link
Collaborator

coot commented Oct 13, 2022

@dcoutts I created an issue to track this, let's close this PR as it's already merged via #27.

@coot coot closed this Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Implement threadStatus
3 participants