Skip to content

Commit 9c4e6b1

Browse files
shakeelbtorvalds
authored andcommitted
mm, mlock, vmscan: no more skipping pagevecs
When a thread mlocks an address space backed either by file pages which are currently not present in memory or swapped out anon pages (not in swapcache), a new page is allocated and added to the local pagevec (lru_add_pvec), I/O is triggered and the thread then sleeps on the page. On I/O completion, the thread can wake on a different CPU, the mlock syscall will then sets the PageMlocked() bit of the page but will not be able to put that page in unevictable LRU as the page is on the pagevec of a different CPU. Even on drain, that page will go to evictable LRU because the PageMlocked() bit is not checked on pagevec drain. The page will eventually go to right LRU on reclaim but the LRU stats will remain skewed for a long time. This patch puts all the pages, even unevictable, to the pagevecs and on the drain, the pages will be added on their LRUs correctly by checking their evictability. This resolves the mlocked pages on pagevec of other CPUs issue because when those pagevecs will be drained, the mlocked file pages will go to unevictable LRU. Also this makes the race with munlock easier to resolve because the pagevec drains happen in LRU lock. However there is still one place which makes a page evictable and does PageLRU check on that page without LRU lock and needs special attention. TestClearPageMlocked() and isolate_lru_page() in clear_page_mlock(). #0: __pagevec_lru_add_fn #1: clear_page_mlock SetPageLRU() if (!TestClearPageMlocked()) return smp_mb() // <--required // inside does PageLRU if (!PageMlocked()) if (isolate_lru_page()) move to evictable LRU putback_lru_page() else move to unevictable LRU In '#1', TestClearPageMlocked() provides full memory barrier semantics and thus the PageLRU check (inside isolate_lru_page) can not be reordered before it. In '#0', without explicit memory barrier, the PageMlocked() check can be reordered before SetPageLRU(). If that happens, '#0' can put a page in unevictable LRU and '#1' might have just cleared the Mlocked bit of that page but fails to isolate as PageLRU fails as '#0' still hasn't set PageLRU bit of that page. That page will be stranded on the unevictable LRU. There is one (good) side effect though. Without this patch, the pages allocated for System V shared memory segment are added to evictable LRUs even after shmctl(SHM_LOCK) on that segment. This patch will correctly put such pages to unevictable LRU. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Shakeel Butt <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Huang Ying <[email protected]> Cc: Tim Chen <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Jan Kara <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Dan Williams <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent c3cc391 commit 9c4e6b1

File tree

4 files changed

+54
-95
lines changed

4 files changed

+54
-95
lines changed

include/linux/swap.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -337,8 +337,6 @@ extern void deactivate_file_page(struct page *page);
337337
extern void mark_page_lazyfree(struct page *page);
338338
extern void swap_setup(void);
339339

340-
extern void add_page_to_unevictable_list(struct page *page);
341-
342340
extern void lru_cache_add_active_or_unevictable(struct page *page,
343341
struct vm_area_struct *vma);
344342

mm/mlock.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,12 @@ void clear_page_mlock(struct page *page)
6464
mod_zone_page_state(page_zone(page), NR_MLOCK,
6565
-hpage_nr_pages(page));
6666
count_vm_event(UNEVICTABLE_PGCLEARED);
67+
/*
68+
* The previous TestClearPageMlocked() corresponds to the smp_mb()
69+
* in __pagevec_lru_add_fn().
70+
*
71+
* See __pagevec_lru_add_fn for more explanation.
72+
*/
6773
if (!isolate_lru_page(page)) {
6874
putback_lru_page(page);
6975
} else {

mm/swap.c

Lines changed: 47 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -445,30 +445,6 @@ void lru_cache_add(struct page *page)
445445
__lru_cache_add(page);
446446
}
447447

448-
/**
449-
* add_page_to_unevictable_list - add a page to the unevictable list
450-
* @page: the page to be added to the unevictable list
451-
*
452-
* Add page directly to its zone's unevictable list. To avoid races with
453-
* tasks that might be making the page evictable, through eg. munlock,
454-
* munmap or exit, while it's not on the lru, we want to add the page
455-
* while it's locked or otherwise "invisible" to other tasks. This is
456-
* difficult to do when using the pagevec cache, so bypass that.
457-
*/
458-
void add_page_to_unevictable_list(struct page *page)
459-
{
460-
struct pglist_data *pgdat = page_pgdat(page);
461-
struct lruvec *lruvec;
462-
463-
spin_lock_irq(&pgdat->lru_lock);
464-
lruvec = mem_cgroup_page_lruvec(page, pgdat);
465-
ClearPageActive(page);
466-
SetPageUnevictable(page);
467-
SetPageLRU(page);
468-
add_page_to_lru_list(page, lruvec, LRU_UNEVICTABLE);
469-
spin_unlock_irq(&pgdat->lru_lock);
470-
}
471-
472448
/**
473449
* lru_cache_add_active_or_unevictable
474450
* @page: the page to be added to LRU
@@ -484,13 +460,9 @@ void lru_cache_add_active_or_unevictable(struct page *page,
484460
{
485461
VM_BUG_ON_PAGE(PageLRU(page), page);
486462

487-
if (likely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) != VM_LOCKED)) {
463+
if (likely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) != VM_LOCKED))
488464
SetPageActive(page);
489-
lru_cache_add(page);
490-
return;
491-
}
492-
493-
if (!TestSetPageMlocked(page)) {
465+
else if (!TestSetPageMlocked(page)) {
494466
/*
495467
* We use the irq-unsafe __mod_zone_page_stat because this
496468
* counter is not modified from interrupt context, and the pte
@@ -500,7 +472,7 @@ void lru_cache_add_active_or_unevictable(struct page *page,
500472
hpage_nr_pages(page));
501473
count_vm_event(UNEVICTABLE_PGMLOCKED);
502474
}
503-
add_page_to_unevictable_list(page);
475+
lru_cache_add(page);
504476
}
505477

506478
/*
@@ -886,15 +858,55 @@ void lru_add_page_tail(struct page *page, struct page *page_tail,
886858
static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
887859
void *arg)
888860
{
889-
int file = page_is_file_cache(page);
890-
int active = PageActive(page);
891-
enum lru_list lru = page_lru(page);
861+
enum lru_list lru;
862+
int was_unevictable = TestClearPageUnevictable(page);
892863

893864
VM_BUG_ON_PAGE(PageLRU(page), page);
894865

895866
SetPageLRU(page);
867+
/*
868+
* Page becomes evictable in two ways:
869+
* 1) Within LRU lock [munlock_vma_pages() and __munlock_pagevec()].
870+
* 2) Before acquiring LRU lock to put the page to correct LRU and then
871+
* a) do PageLRU check with lock [check_move_unevictable_pages]
872+
* b) do PageLRU check before lock [clear_page_mlock]
873+
*
874+
* (1) & (2a) are ok as LRU lock will serialize them. For (2b), we need
875+
* following strict ordering:
876+
*
877+
* #0: __pagevec_lru_add_fn #1: clear_page_mlock
878+
*
879+
* SetPageLRU() TestClearPageMlocked()
880+
* smp_mb() // explicit ordering // above provides strict
881+
* // ordering
882+
* PageMlocked() PageLRU()
883+
*
884+
*
885+
* if '#1' does not observe setting of PG_lru by '#0' and fails
886+
* isolation, the explicit barrier will make sure that page_evictable
887+
* check will put the page in correct LRU. Without smp_mb(), SetPageLRU
888+
* can be reordered after PageMlocked check and can make '#1' to fail
889+
* the isolation of the page whose Mlocked bit is cleared (#0 is also
890+
* looking at the same page) and the evictable page will be stranded
891+
* in an unevictable LRU.
892+
*/
893+
smp_mb();
894+
895+
if (page_evictable(page)) {
896+
lru = page_lru(page);
897+
update_page_reclaim_stat(lruvec, page_is_file_cache(page),
898+
PageActive(page));
899+
if (was_unevictable)
900+
count_vm_event(UNEVICTABLE_PGRESCUED);
901+
} else {
902+
lru = LRU_UNEVICTABLE;
903+
ClearPageActive(page);
904+
SetPageUnevictable(page);
905+
if (!was_unevictable)
906+
count_vm_event(UNEVICTABLE_PGCULLED);
907+
}
908+
896909
add_page_to_lru_list(page, lruvec, lru);
897-
update_page_reclaim_stat(lruvec, file, active);
898910
trace_mm_lru_insertion(page, lru);
899911
}
900912

mm/vmscan.c

Lines changed: 1 addition & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -769,64 +769,7 @@ int remove_mapping(struct address_space *mapping, struct page *page)
769769
*/
770770
void putback_lru_page(struct page *page)
771771
{
772-
bool is_unevictable;
773-
int was_unevictable = PageUnevictable(page);
774-
775-
VM_BUG_ON_PAGE(PageLRU(page), page);
776-
777-
redo:
778-
ClearPageUnevictable(page);
779-
780-
if (page_evictable(page)) {
781-
/*
782-
* For evictable pages, we can use the cache.
783-
* In event of a race, worst case is we end up with an
784-
* unevictable page on [in]active list.
785-
* We know how to handle that.
786-
*/
787-
is_unevictable = false;
788-
lru_cache_add(page);
789-
} else {
790-
/*
791-
* Put unevictable pages directly on zone's unevictable
792-
* list.
793-
*/
794-
is_unevictable = true;
795-
add_page_to_unevictable_list(page);
796-
/*
797-
* When racing with an mlock or AS_UNEVICTABLE clearing
798-
* (page is unlocked) make sure that if the other thread
799-
* does not observe our setting of PG_lru and fails
800-
* isolation/check_move_unevictable_pages,
801-
* we see PG_mlocked/AS_UNEVICTABLE cleared below and move
802-
* the page back to the evictable list.
803-
*
804-
* The other side is TestClearPageMlocked() or shmem_lock().
805-
*/
806-
smp_mb();
807-
}
808-
809-
/*
810-
* page's status can change while we move it among lru. If an evictable
811-
* page is on unevictable list, it never be freed. To avoid that,
812-
* check after we added it to the list, again.
813-
*/
814-
if (is_unevictable && page_evictable(page)) {
815-
if (!isolate_lru_page(page)) {
816-
put_page(page);
817-
goto redo;
818-
}
819-
/* This means someone else dropped this page from LRU
820-
* So, it will be freed or putback to LRU again. There is
821-
* nothing to do here.
822-
*/
823-
}
824-
825-
if (was_unevictable && !is_unevictable)
826-
count_vm_event(UNEVICTABLE_PGRESCUED);
827-
else if (!was_unevictable && is_unevictable)
828-
count_vm_event(UNEVICTABLE_PGCULLED);
829-
772+
lru_cache_add(page);
830773
put_page(page); /* drop ref from isolate */
831774
}
832775

0 commit comments

Comments
 (0)