Add detach replica in shutdown function when error occurs during apply segment of journal #8688

XaBbl4 · 2025-08-05T11:45:27Z

If an error occurs during segment apply that results in a return other than PROCESS_CONTINUE, the connection to the replica will not be detach. With asynchronous replication the error will like be repeated, result to many connections that take up RAM and hold a connection to the database.

dyemanov · 2025-08-06T04:59:35Z

src/remote/server/ReplServer.cpp

 			m_replicator = nullptr;
+			if (m_attachment.hasData())
+				m_attachment->detach(&localStatus);


Both m_replicator and m_attachment are RefPtr's and nullifying them causes a release() call which does the proper cleanup (similar to close/detach) under the hood. So I don't see what is actually fixed. What do I miss?

It turns out that for m_replicator close is called during release and it is not necessary to do this (I added it for the company with m_attachment, which is the problem).
There detach is not called during release, only destroy, in which requests, events, transactions, etc. are released and the handle is also removed, but the connection to the DB itself remain.

Hmmm. YAttachment::destroy() calls destroy2() which nullifies next which in turn calls release() for the underlying provider. So AFAIU the connection should be closed.

That's right.
The problem occurs when using an external engine, even without a replication error.
It uses the context of the current replication connection and increases the reference counter, and when the main connection is nullified. In the JAttachment::release function the counter is not equal to 0 and freeEngineData is not called, in which the following functions should be called in the chain:

purge_attachment release_attachment dbb->dbb_extManager->closeAttachment delete attInfo ~ExternalContextImpl externalAttachment->release // In which the counter should be reduced and the external engine connection should be deleted.

This behavior occurs in RedDatabase, but I am sure that it will be the same in Firebird, since the code in this part is not different, but I cannot check, since there are problems with configuring jaybird for Firebird.

Calling the detach function on the main connection directly calls internalDetach and freeEngineData, which closes the connection and releases all resources.

Well, this smells like a problem to me -- the fact that the external engine confuses the reference counter of the object created by user, thus making the whole ref-count-based management unreliable.

Nevertheless, I don't mind fixing this particular issue the way you suggest. However, it seems to me that nullifying the reference (and thus possibly calling release()) is unnecessary after close()/detach(). I have a feeling that the code should look something like this:

if (m_replicator.hasData()) { m_replicator->close(&localStatus); m_replicator.clear(); } if (m_attachment.hasData()) { m_attachment->detach(&localStatus); m_attachment.clear(); }

Could you please double-check this?

Is it possible to reproduce the issue using sample UDR ?

Yes, I got a reproducible case with UDR on Firebird. I didn't think about it, thanks.

It is enough to setup asynchronous replication and add a procedure on the master:

create or alter procedure gen_rows ( start_n integer not null, end_n integer not null ) returns ( n integer not null ) external name 'udrcpp_example!gen_rows' engine udr;

In this case, the segment will be applied on the replica, but the connection will remain active, if after that add the segment again using the external engine, a second connection will appear, and so on.

And maybe these two pointers should not be RefPtr's at all, given they're used by a single object only and destroyed explicitly...

Remade without using RefPtr.

Is it possible to reproduce the issue using sample UDR ?

Yes, I got a reproducible case with UDR on Firebird. I didn't think about it, thanks.

Could you provide it here ? With all necessary steps, please.

Could you provide it here ? With all necessary steps, please.

Setup replication.conf:

database = d:\repl\master.fdb { journal_directory = d:\repl\journal journal_archive_directory = d:\repl\archive journal_archive_timeout = 10 } database = d:\repl\replica.fdb { journal_source_directory = d:\repl\replica }

isql> create database 'd:\repl\master.fdb';

Copy DB to d:\repl\replica.fdb and execute:
gfix -repl read_only d:\repl\replica.fdb

On master DB:
isql> alter database enable publication;
isql> alter database include all to publication;

Connect to master DB and create procedure:

create or alter procedure gen_rows ( start_n integer not null, end_n integer not null ) returns ( n integer not null ) external name 'udrcpp_example!gen_rows' engine udr; commit;

Disconnect or wait for archive segment 1.
Create new same procedure on master for create archive 2.

Copy segment 1 to d:\repl\replica\, wait applied the segment, connect to replica and execute:
isql> select count(*) from mon$attachments;
isql> commit;

got 4 attachments:

current attachment,

garbage collector,

cache writer,

dead applier attachment.

Copy segment 2 and wait applied, after execute same query and got 5 attachments: current, garbage collector, cache writer, dead applier attachment (x2).

Thanks.

Create new same procedure on master for create archive 2.

Do you mean recreate or create with a new name ?

Do you mean recreate or create with a new name ?

I create it with a new name, but I think it can be recreated.

When apply segment using an external engine (Jaybird/UDR), the replica is not detached by nullifying the variable, because releaseAttachment is not called

dyemanov reviewed Aug 6, 2025

View reviewed changes

Add detach replica to shutdown function

26f2495

When apply segment using an external engine (Jaybird/UDR), the replica is not detached by nullifying the variable, because releaseAttachment is not called

XaBbl4 force-pushed the add_detach_repl_when_error_occurs branch from 13e1e9b to 26f2495 Compare August 21, 2025 15:09

dyemanov approved these changes Aug 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add detach replica in shutdown function when error occurs during apply segment of journal #8688

Add detach replica in shutdown function when error occurs during apply segment of journal #8688

Uh oh!

XaBbl4 commented Aug 5, 2025

Uh oh!

dyemanov Aug 6, 2025

Uh oh!

XaBbl4 Aug 6, 2025

Uh oh!

dyemanov Aug 7, 2025

Uh oh!

XaBbl4 Aug 14, 2025

Uh oh!

dyemanov Aug 21, 2025 •

edited

Loading

Uh oh!

XaBbl4 Aug 21, 2025

Uh oh!

hvlad Aug 21, 2025

Uh oh!

XaBbl4 Aug 21, 2025

Uh oh!

hvlad Aug 21, 2025

Uh oh!

XaBbl4 Aug 21, 2025

Uh oh!

Uh oh!

Uh oh!

Add detach replica in shutdown function when error occurs during apply segment of journal #8688

Are you sure you want to change the base?

Add detach replica in shutdown function when error occurs during apply segment of journal #8688

Uh oh!

Conversation

XaBbl4 commented Aug 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dyemanov Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dyemanov Aug 21, 2025 •

edited

Loading