Skip to content

Improve attachments deletion #35103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

lunny
Copy link
Member

@lunny lunny commented Jul 16, 2025

Refactor attachment deletion logic

The attachment of comment deletion process has been moved from the AfterDelete hook to the service layer. This change avoids scenarios where files are deleted while the database transaction is later rolled back. The new implementation introduces a two-stage deletion process using a status column to mark attachments as deleted. These marked attachments are excluded from all UI and API queries. A background cleanup queue is responsible for permanently deleting the files and then removing the corresponding database records. If file deletion fails repeatedly, a system notice will be issued to alert administrators.

This will also improve performance when deleting a release with many files.

Removed unused functions

DeleteAttachmentsByIssue and DeleteAttachmentsByComment are removed because only tests need them.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jul 16, 2025
@lunny lunny added the type/bug label Jul 16, 2025
@github-actions github-actions bot added modifies/api This PR adds API routes or modifies them modifies/go Pull requests that update Go code modifies/frontend labels Jul 16, 2025
@lunny lunny added the backport/v1.24 This PR should be backported to Gitea 1.24 label Jul 16, 2025
@github-actions github-actions bot added the modifies/cli PR changes something on the CLI, i.e. gitea doctor or gitea admin label Jul 17, 2025
@lunny lunny removed the backport/v1.24 This PR should be backported to Gitea 1.24 label Jul 17, 2025
@lunny lunny marked this pull request as draft July 17, 2025 07:04
@lunny lunny force-pushed the lunny/fix_bug_delete_code_comment branch from 026b762 to f3d0e51 Compare July 18, 2025 00:43
@github-actions github-actions bot removed the modifies/cli PR changes something on the CLI, i.e. gitea doctor or gitea admin label Jul 18, 2025
@lunny lunny force-pushed the lunny/fix_bug_delete_code_comment branch from 4ae54c8 to 97556a8 Compare July 19, 2025 00:44
@lunny lunny marked this pull request as ready for review July 19, 2025 21:33
@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Jul 20, 2025
Copy link
Contributor

@wxiaoguang wxiaoguang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall doesn't look good .....

// Copyright 2025 The Gitea Authors. All rights reserved.
// SPDX-License-Identifier: MIT

package db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why in db package? No proper package for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the enum can be reused for other storages in the future.

type FileStatus int

const (
FileStatusNormal FileStatus = iota // FileStatusNormal indicates the file is normal and exists on disk.
Copy link
Contributor

@wxiaoguang wxiaoguang Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zero values cause various problems in XORM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if res > 0 {
var reviewComment Comment
has, err := db.GetEngine(ctx).Where("review_id = ?", comment.ReviewID).
And("type = ?", CommentTypeReview).Get(&reviewComment)
Copy link
Contributor

@wxiaoguang wxiaoguang Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type, quote or not?

Image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// delete review & review comment if the code comment is the last comment of the review
if comment.Type == CommentTypeCode && comment.ReviewID > 0 {
res, err := db.GetEngine(ctx).ID(comment.ReviewID).
Where("NOT EXISTS (SELECT 1 FROM comment WHERE review_id = ? AND `type` = ?)", comment.ReviewID, CommentTypeCode).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to test with all databases since raw SQL is used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to #35133

@@ -305,6 +305,9 @@ func UpdateIssueAttachments(ctx context.Context, issueID int64, uuids []string)
return fmt.Errorf("getAttachmentsByUUIDs [uuids: %v]: %w", uuids, err)
}
for i := range attachments {
if attachments[i].IssueID != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it cause problems if an attachment link is copied from issue-1 to issue-2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attachment was uploaded with no issue_id assigned when creating a new issue. And then invoking this method to update the issue id when submit the issue. The change will prevent one attachment which has been assigned an issue id to be changed to another.

If an attachment link copied to another place or outside of Gitea site. It will return 404 if the attachment is deleted. I don't understand what's your concern.

LastDeleteFailedTime timeutil.TimeStamp // Last time the deletion failed, used to prevent infinite loop
Size int64 `xorm:"DEFAULT 0"`
CreatedUnix timeutil.TimeStamp `xorm:"created"`
CustomDownloadURL string `xorm:"-"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a full copy-paste to the Attachment struct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remove this line. Keeping other columns there so that there is a snapshot of what the table should be at that time.

return err
}

if _, err := x.Exec("UPDATE `attachment` SET status = ? WHERE status IS NULL", db.FileStatusNormal); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "null-able"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var cleanQueue *queue.WorkerPoolQueue[int64]

func Init() error {
cleanQueue = queue.CreateSimpleQueue(graceful.GetManager().ShutdownContext(), "attachments-clean", handler)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it needs to use a queue?

Copy link
Member Author

@lunny lunny Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cleanup queue helps avoid immediate deletion of attachment files from disk, reducing blocking time for end users - especially when deleting large releases or repositories with many attachments. The associated cleanup cron task does not need to run frequently because of clean queue exist; cron task currently executes once every 24 hours. This cron task primarily handles rare edge cases, such as when a database transaction is successfully committed but the system restarts before the corresponding IDs are pushed to the queue. In most typical scenarios, the queue is processed shortly after the database transaction completes.


if (json.deletedReviewCommentHashTag) {
document.querySelector(`#${json.deletedReviewCommentHashTag}`)?.remove();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#${json.deletedReviewCommentHashTag} should be the parent of the "delete button"? Either you should use "closest", or use a "data-xxxx" to make the delete button can find the target, but not use the fragile backend generated ID as selector.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to #35133

@GiteaBot GiteaBot added lgtm/blocked A maintainer has reservations with the PR and thus it cannot be merged and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Jul 20, 2025
lastID := int64(0)
for {
if err := db.GetEngine(ctx).
Where("id > ? AND status = ?", lastID, db.FileStatusToBeDeleted).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very low slow. You need an index like this: (status, id)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if err := db.GetEngine(ctx).
Where("id > ? AND status = ?", lastID, db.FileStatusToBeDeleted).
Limit(100).
Find(&attachments); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It generates wrong result since you do not make them have a stable order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


var deletedReviewComment *Comment

// delete review & review comment if the code comment is the last comment of the review
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it can delete the review? What if the review is a "change request" or "approval"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to #35133

}

attachment.AddAttachmentsToCleanQueue(ctx, comment.Attachments)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You deleted 2 comments above, but only handle one comment's "attachments"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The review comment will be deleted if it contains no content or attachments, and its associated code comment is the last remaining one. If it has any attachments, it should not be deleted automatically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reviewComment deletion has been moved to #35133, now only one comment deleted in this pull request.


var deletedReviewComment *Comment

// delete review & review comment if the code comment is the last comment of the review
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no test for such a changed behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to #35133

@wxiaoguang
Copy link
Contributor

Overall doesn't look good .....

Change my opinion: it looks wrong.

@lunny
Copy link
Member Author

lunny commented Jul 21, 2025

Overall doesn't look good .....

Change my opinion: it looks wrong.

I’ve pushed some new commits. Could you please review them again and let me know if there are any concerns?

@wxiaoguang
Copy link
Contributor

Overall doesn't look good .....

Change my opinion: it looks wrong.

I’ve pushed some new commits. Could you please review them again and let me know if there are any concerns?

The design is wrong and not worth to review.

If your plan is "the enum can be reused for other storages in the future", you need a separate table like "storage_file_deletion (storage_type, storage_path)", but not keep patching other tables.

@lunny
Copy link
Member Author

lunny commented Jul 21, 2025

@lafriks @wxiaoguang I reimplemented it and use a standalone table storage_path_deletion to handle file deletions. Please review again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm/blocked A maintainer has reservations with the PR and thus it cannot be merged modifies/api This PR adds API routes or modifies them modifies/go Pull requests that update Go code modifies/migrations modifies/translation type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants