Skip to content

Logger safe async io #245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Nov 26, 2020
Merged

Conversation

wclodius2
Copy link
Contributor

Changes in four files:

  • doc/specs/index.md - added stdlib_logger.html to the index;
  • doc/specs/stdlib_logger.md - described the changes in stdlib_logger.f90;
  • src/tests/logger/test-stdlib_logger.f90 - added a blank after a comma in the output;
  • src/stdlib_logger.f90 - numerous changes
    • the code handles new_line calls in the message string properly (added by @jvdp1, but undocumented in stdlib_logger.md)
    • a character string buffer was added to logger_type and alen_buffer integer to track the number of active characters in the buffer;
    • the code replaced multiple calls to write statements with single calls with line breaks generated by inserting new_line in the output string, in order to make the output thread and asynchronous IO safe;
    • shortened a number of lines most of them 80+ characters long
    • changed Rrea_only_error to read_only_error

Added a blank to the output afer a comma.

[ticket: X]
Added one line to the index referencing stdlib_logger.html.

[ticket: X]
Documented changes in stdlib_logger.f90. Among them:
1. Additions of buffer and len_buffer to the logger type
2. Changes of self from intent(in) to intent(inout) in several procedures
3. Allowing new_line call in messsage string

[ticket: X]
Numerous changes to src/stdlib_logger.f90:
1. Changed `Rrea_...` to `read_...`
2. Shortened several long lines, mostly those longer than 80 characters
3. Added buffer and len_buffer to logger_type(?)
4. Changed output so that instead of multiple writes of individual strings there
   is a single write of a single string formatted by inserting new_line calls.

[ticket: X]
Changed indexing of the self % buffer from 0 based to the proper 1s based, and
eliminated some unused variables.

[ticket: X]
@milancurcic
Copy link
Member

@14NGiestas if your fix for docs build works, can you submit it as a separate PR? Then we can merge it and rebase this PR to master so that the docs build.

@14NGiestas
Copy link
Member

@14NGiestas if your fix for docs build works, can you submit it as a separate PR? Then we can merge it and rebase this PR to master so that the docs build.

It is already a separate PR #246. I'm just waiting the CI to finish.

milancurcic added a commit that referenced this pull request Nov 16, 2020
Co-authored-by: Ian Giestas Pauli <[email protected]>
Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small issues discovered while testing the changes... There are still a few issues that I will try to solve.

Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: the end of a string with indent=.true. was not printed correctly.

The 3 proposed suggestions should solve this issue properly.

jvdp1 and others added 2 commits November 17, 2020 22:05
Changed log_error and log_io_error similarly
1. Added dummy character variable of len 28 and used it to hold stat/iostat string
representation or lack thereof.
2. Changed suffix to allocatable len character variable and used it to hold the
combined stat/iostat and errmsg/iomsg string representations or lack thereof.
3. Eliminated the trim of suffix as unneeded.
4. Eliminated unit and passed -999 directly to handle_write_failure.
5. For log_io_error changed procedure_name to 'log_io_error'.

[ticket: X]
@wclodius2
Copy link
Contributor Author

I have committed and pushed fixes for the unit = -999 and the potentially too short suffix in both log_error and log_io_error.

Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes sound good to me. Thank you @wclodius2 for these improvements!
IMO, this PR could be merged.

@jvdp1 jvdp1 requested a review from milancurcic November 18, 2020 07:33
Copy link
Member

@milancurcic milancurcic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @wclodius2, I tested the changes locally.

Is it feasible to add a test that confirms that it's indeed async I/O safe? As I understand how this is implemented--write multi-lines into a character string, then write it to output in a single write statement--it should work as advertised. But a test program would help ensure this. What do you think?

Do you expect that this is async I/O safe also for SPMD programs using MPI or coarrays? Specifically, in a pattern where multiple processes are logging to the same file, or stdout. Granted, a safer programming pattern in this case would be for each process to write to its own dedicated logger file. Should we make note of this in the docs?

@jvdp1
Copy link
Member

jvdp1 commented Nov 22, 2020

Do you expect that this is async I/O safe also for SPMD programs using MPI or coarrays? Specifically, in a pattern where multiple processes are logging to the same file, or stdout. Granted, a safer programming pattern in this case would be for each process to write to its own dedicated logger file. Should we make note of this in the docs?

Would an OpenMP do loop with a logger do the trick? I have a similar example in one of my codes, however I put the logger in a critical OMP section with the first implementation.

@wclodius2
Copy link
Contributor Author

While a test would be useful I don't know how to write it. Parallel programming was not in demand for my work. In particular, how do we ensure that asynchronous calls aren't accidentally synchronous? Do we want to special case the cmake compiler flags and Makefile.manual flags for stdlib_logger.f90 and test_stdlib_logger.f90?

A note to the effect of "While every effort has been made to make the code thread and asynchronous I/O safe, it is always best to have each process write to its own dedicated logger file" might be useful.

On a related note, should I add an asynchronous argument to the subroutine add_log_file?

@jvdp1
Copy link
Member

jvdp1 commented Nov 23, 2020

While a test would be useful I don't know how to write it. Parallel programming was not in demand for my work. In particular, how do we ensure that asynchronous calls aren't accidentally synchronous? Do we want to special case the cmake compiler flags and Makefile.manual flags for stdlib_logger.f90 and test_stdlib_logger.f90?

Could such a test be satisfying, assuming that most compilers support basic OpenMP directives?

...
type(logger_type) :: logger
character (len = 30) :: cdumy
...

call logger % add_log_file( 'log.txt')
call logger % configure(indent = .true., time_stamp = .true.)
....
!$omp parallel default(none) shared(logger)
!$omp do
do i = 1, 100
  !$ write(cdummy, '(i0)') omp_get_thread_num()
  !$omp atomic
  call logger % log_message( ' Message from thread '//trim(cdummy) )'
enddo
!$omp end do
!$omp end parallel
....

Such a test would be more about thread then asynchronous safety.
To "mimick" MPI, an array of loggers could be used, and each thread would used its own logger.
However, what/how should be tested to have a proper test?

A note to the effect of "While every effort has been made to make the code thread and asynchronous I/O safe, it is always best to have each process write to its own dedicated logger file" might be useful.

IMO it is a good idea to add it in the specs, especially if we claim it should asynchornous/thread/process-safe.

On a related note, should I add an asynchronous argument to the subroutine add_log_file?

It would be good indeed. Probably it is best to do it in another PR independent from this one.

@milancurcic
Copy link
Member

A note to the effect of "While every effort has been made to make the code thread and asynchronous I/O safe, it is always best to have each process write to its own dedicated logger file" might be useful.

@wclodius2, I agree. Do you mind adding this note in the spec document where you find it most appropriate?

I think this will then be good to merge.

Added the sentence "While every effort has been made to make the code thread and
asynchronous I/O safe, it is always best to have each process write to
its own dedicated logger file."

[ticket: X]
@wclodius2
Copy link
Contributor Author

I have added the sentence to stdlib_logger.md just before the note at the end of the introduction.

@jvdp1 I have no experience with Open MP. My gut feeling is that an additional test wouldn't hurt, but it should probably be written by you as a separate PR. Things to think about if you should decide to do it.

  1. It appears that compiling with gfortran for open MP requires the compiler flag -fopenmp. Would that flag apply to the library as a whole or just the test code?
  2. Do you envision the code as a separate "application" or part of the current test case?
  3. I find going through a hundred lines of output a bit daunting. Would ten lines be a sufficient test?

@14NGiestas
Copy link
Member

Does this PR handles access to the buffer from different threads? Because having a buffer string without guarding the critical region does not guarantee one thread will not overwrite the buffer while the other thread was writing. I need to review the code to better understand that, but I belive It's up to the user put the logger call in a guarded region (omp critical, for e.g.).

@jvdp1
Copy link
Member

jvdp1 commented Nov 24, 2020

@wclodius2 I will propose another test later, when this is merged.

Does this PR handles access to the buffer from different threads? Because having a buffer string without guarding the critical region does not guarantee one thread will not overwrite the buffer while the other thread was writing. I need to review the code to better understand that, but I belive It's up to the user put the logger call in a guarded region (omp critical, for e.g.).

Good point. No, there is no guarding within stdlib_logger that different threads access to the buffer at the same time (that is why I set it in an atomic section in the example above). We could add orphaned OpenMP directives in stdlib_logger. However, as @14NGiestas , I would think it is up to the user to put the logger call in a guarded region. In that case, a sentence in the specs should be added for multi-thread programs.

Note that this issue should not apppear in MPI/coarray-type programs.

Added Jeremies revision of the discussion in stdlib_logger.md.

[ticket: X]
@wclodius2
Copy link
Contributor Author

@14NGiestas FWIW the string buffer is in a logger object, and not a module variable. With a little extra work it could be made a procedure variable.

@wclodius2
Copy link
Contributor Author

FWIW with my latest push five of six CI/Builds passed. The one that failed "CI / Build (macos-latest, 9) (pull_request)" failed in running test_stdlib_bitset_large with a file read error. It passes that test on my system with gfortran 10.2.

@jvdp1
Copy link
Member

jvdp1 commented Nov 24, 2020

FWIW with my latest push five of six CI/Builds passed. The one that failed "CI / Build (macos-latest, 9) (pull_request)" failed in running test_stdlib_bitset_large with a file read error. It passes that test on my system with gfortran 10.2.

I tested it on my computer (Fedora) with gfortran 9.3.1, and there were no issues.
I rerun the tests, but they still failed.

@wclodius2
Copy link
Contributor Author

FWIW the Mac OS run that fails uses gfortran 9.3.0. It fails with an unexpected End-of-File, which I have trouble understanding as I would expect that error to occur only if there were a coding error that affects all runs. A glance at the code in stdlib_bitsets_large.f90 and test_stdlib_bitset_large.f90 doesn't show anything wrong with the pertinent sections of code. Further, CI / Build (ubuntu-latest, 9) also uses gfortran 9.3.0 and passes the tests. Maybe it is a bad disk sector involving the source code. I hate boggles!

@jvdp1
Copy link
Member

jvdp1 commented Nov 25, 2020

FWIW the Mac OS run that fails uses gfortran 9.3.0. It fails with an unexpected End-of-File, which I have trouble understanding as I would expect that error to occur only if there were a coding error that affects all runs. A glance at the code in stdlib_bitsets_large.f90 and test_stdlib_bitset_large.f90 doesn't show anything wrong with the pertinent sections of code. Further, CI / Build (ubuntu-latest, 9) also uses gfortran 9.3.0 and passes the tests. Maybe it is a bad disk sector involving the source code. I hate boggles!

I tested it again on my computer with different compilers and debug options. It always passed the tests! So no idea...

Comment on lines +37 to +42
While every effort has been made to make the code process and
asynchronous I/O safe, it is always best to have each process write to
its own dedicated logger file.
For thread parallelism (e.g., with OpenMP), it is advised to put the
logger call in a guarding region (e.g., in an OpenMP critical region).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milancurcic
Are you satisfied with this note that covers multi-thread and multi-process programs? If yes, I am ok to merge this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm happy with it so I'll go ahead and merge. Thank you all.

@milancurcic milancurcic merged commit 479f582 into fortran-lang:master Nov 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants