Skip to content

Commit e39bb6b

Browse files
yhuang-inteltorvalds
authored andcommitted
NUMA Balancing: add page promotion counter
Patch series "NUMA balancing: optimize memory placement for memory tiering system", v13 With the advent of various new memory types, some machines will have multiple types of memory, e.g. DRAM and PMEM (persistent memory). The memory subsystem of these machines can be called memory tiering system, because the performance of the different types of memory are different. After commit c221c0b ("device-dax: "Hotplug" persistent memory for use like normal RAM"), the PMEM could be used as the cost-effective volatile memory in separate NUMA nodes. In a typical memory tiering system, there are CPUs, DRAM and PMEM in each physical NUMA node. The CPUs and the DRAM will be put in one logical node, while the PMEM will be put in another (faked) logical node. To optimize the system overall performance, the hot pages should be placed in DRAM node. To do that, we need to identify the hot pages in the PMEM node and migrate them to DRAM node via NUMA migration. In the original NUMA balancing, there are already a set of existing mechanisms to identify the pages recently accessed by the CPUs in a node and migrate the pages to the node. So we can reuse these mechanisms to build the mechanisms to optimize the page placement in the memory tiering system. This is implemented in this patchset. At the other hand, the cold pages should be placed in PMEM node. So, we also need to identify the cold pages in the DRAM node and migrate them to PMEM node. In commit 26aa2d1 ("mm/migrate: demote pages during reclaim"), a mechanism to demote the cold DRAM pages to PMEM node under memory pressure is implemented. Based on that, the cold DRAM pages can be demoted to PMEM node proactively to free some memory space on DRAM node to accommodate the promoted hot PMEM pages. This is implemented in this patchset too. We have tested the solution with the pmbench memory accessing benchmark with the 80:20 read/write ratio and the Gauss access address distribution on a 2 socket Intel server with Optane DC Persistent Memory Model. The test results shows that the pmbench score can improve up to 95.9%. This patch (of 3): In a system with multiple memory types, e.g. DRAM and PMEM, the CPU and DRAM in one socket will be put in one NUMA node as before, while the PMEM will be put in another NUMA node as described in the description of the commit c221c0b ("device-dax: "Hotplug" persistent memory for use like normal RAM"). So, the NUMA balancing mechanism will identify all PMEM accesses as remote access and try to promote the PMEM pages to DRAM. To distinguish the number of the inter-type promoted pages from that of the inter-socket migrated pages. A new vmstat count is added. The counter is per-node (count in the target node). So this can be used to identify promotion imbalance among the NUMA nodes. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Reviewed-by: Yang Shi <[email protected]> Tested-by: Baolin Wang <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Acked-by: Johannes Weiner <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Zi Yan <[email protected]> Cc: Wei Xu <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: zhongjiang-ali <[email protected]> Cc: Feng Tang <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent ee97347 commit e39bb6b

File tree

4 files changed

+21
-3
lines changed

4 files changed

+21
-3
lines changed

include/linux/mmzone.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,9 @@ enum node_stat_item {
221221
NR_PAGETABLE, /* used for pagetables */
222222
#ifdef CONFIG_SWAP
223223
NR_SWAPCACHE,
224+
#endif
225+
#ifdef CONFIG_NUMA_BALANCING
226+
PGPROMOTE_SUCCESS, /* promote successfully */
224227
#endif
225228
NR_VM_NODE_STAT_ITEMS
226229
};

include/linux/node.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,4 +181,9 @@ static inline void register_hugetlbfs_with_node(node_registration_func_t reg,
181181

182182
#define to_node(device) container_of(device, struct node, dev)
183183

184+
static inline bool node_is_toptier(int node)
185+
{
186+
return node_state(node, N_CPU);
187+
}
188+
184189
#endif /* _LINUX_NODE_H_ */

mm/migrate.c

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2069,6 +2069,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
20692069
pg_data_t *pgdat = NODE_DATA(node);
20702070
int isolated;
20712071
int nr_remaining;
2072+
unsigned int nr_succeeded;
20722073
LIST_HEAD(migratepages);
20732074
new_page_t *new;
20742075
bool compound;
@@ -2107,7 +2108,8 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
21072108

21082109
list_add(&page->lru, &migratepages);
21092110
nr_remaining = migrate_pages(&migratepages, *new, NULL, node,
2110-
MIGRATE_ASYNC, MR_NUMA_MISPLACED, NULL);
2111+
MIGRATE_ASYNC, MR_NUMA_MISPLACED,
2112+
&nr_succeeded);
21112113
if (nr_remaining) {
21122114
if (!list_empty(&migratepages)) {
21132115
list_del(&page->lru);
@@ -2116,8 +2118,13 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
21162118
putback_lru_page(page);
21172119
}
21182120
isolated = 0;
2119-
} else
2120-
count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_pages);
2121+
}
2122+
if (nr_succeeded) {
2123+
count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
2124+
if (!node_is_toptier(page_to_nid(page)) && node_is_toptier(node))
2125+
mod_node_page_state(pgdat, PGPROMOTE_SUCCESS,
2126+
nr_succeeded);
2127+
}
21212128
BUG_ON(!list_empty(&migratepages));
21222129
return isolated;
21232130

mm/vmstat.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1242,6 +1242,9 @@ const char * const vmstat_text[] = {
12421242
#ifdef CONFIG_SWAP
12431243
"nr_swapcached",
12441244
#endif
1245+
#ifdef CONFIG_NUMA_BALANCING
1246+
"pgpromote_success",
1247+
#endif
12451248

12461249
/* enum writeback_stat_item counters */
12471250
"nr_dirty_threshold",

0 commit comments

Comments
 (0)