Skip to content

A suggestion of SST compaction for better space reclaim. #1053

@wangleiin

Description

@wangleiin

What version of Go are you using (go version)?

$ go version
go version go1.11.4 linux/amd64

What version of Badger are you using?

v1.6

Does this issue reproduce with the latest master?

Yes

Recently I saw a increasing memory usage as well as data accumulation of WAL in dgraph raft progress. Then I export KVs and find most of which are deleted records. As dgraph raft record all entities and delete older ones during snapshot, most of these should be removed during SST table compaction.
After a view of Badger compaction strategy, I realize keys can only be removed when compact to a non-overlay next level.
Review func compactBuildTables in file github.com/dgraph-io/badger/levels.go

func (s *levelsController) compactBuildTables(
	lev int, cd compactDef) ([]*table.Table, func() error, error) {
	topTables := cd.top
	botTables := cd.bot

	var hasOverlap bool
	{
		kr := getKeyRange(cd.top...)
		for i, lh := range s.levels {
			if i <= lev { // Skip upper levels.
				continue
			}
			lh.RLock()
			left, right := lh.overlappingTables(levelHandlerRLocked{}, kr)
			lh.RUnlock()
			if right-left > 0 {
				hasOverlap = true
				break
			}
		}
	}
...

There's hasOverlap check that works.
As preserved keys like !badger!head write to every single level0 table and inherited by all lower levels, there tends to be hasOverlap, even for neighborhoods that are being compacted with empty lower levels.
For example:
L0: sst5 sst6 sst7
L1:sst4
L2:empty
...
We want to compact sst5, sst6 and sst7 in level0 to level1, as sst4 in L1 overlap with tables in L0, the result will be:
L0: empty
L1:sst8, sst9
L2:empty
...
Assuming that there are some deleting records in sst5/sst6/sst7, those records will write back to sst8/sst9 and it's hard to remove these records.

I think it could be improved by check like codes below.

func (s *levelsController) checkOverlap(lev int, tables []*table.Table) bool {
	var hasOverlap bool
	kr := getKeyRange(tables)
	for i, lh := range s.levels {
		if i <= lev { // Skip upper levels.
			continue
		}
		lh.RLock()
		left, right := lh.overlappingTables(levelHandlerRLocked{}, kr)

		lh.RUnlock()
		if right-left > 0 {
			hasOverlap = true
			break
		}
	}
	return hasOverlap
}

func (s *levelsController) compactBuildTables(
	lev int, cd compactDef) ([]*table.Table, func() error, error) {
	topTables := cd.top
	botTables := cd.bot

	var hasOverlap bool
	hasOverlap = s.checkOverlap(lev+1, cd.top) || s.checkOverlap(lev+1, cd.bot)
...

Here we check if L0 and L1 overlap with L2 etc. while omitting overlap between L0 and L1, as they will compact later.
Please let me know if it works or any unknown faults.

Thank you!

Metadata

Metadata

Assignees

Labels

area/performancePerformance related issues.kind/enhancementSomething could be better.priority/P1Serious issue that requires eventual attention (can wait a bit)status/acceptedWe accept to investigate or work on it.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions