Skip to content

Commit 98b4cf4

Browse files
committed
Duplicate function elimination
This change adds support for duplicate function elimination (DFE) to the JavaScript optimizer. A new JS file has been added - eliminate-duplicate-functions.js - which is used to postprocess the output generated by Emscripten. We add a new file, rather than augmenting the existing JS optimizer file, for a variety of reasons - pass independence, reduced coupling between Python scripts and the JS optimizer, etc. We introduce a multipass algorithm in which each pass consists of the following four phases: Phase 1 - identify duplicate functions using a hash of the function body Phase 2 - identify variable names that would conflict after renaming function calls Phase 3 - generate mapping from equivalent functions to their replacement function - use the information from Phase 2 to ensure that the replacement function is not a variable name Phase 4 - use the mapping generated in Phase 3 to perform the reduction NOTE: In some rare cases, we may actually not be able to move on from Phase 3 if we find that we cannot generate a mapping because of conflicts with variable names. One pass can reveal new sets of identical functions which in turn can be reduced by further passes. Empirically, four or five passes are sufficient to eliminate all duplicate functions. Internally, therefore, the elimination will perform 5 passes by default. This can be overridden by setting ELIMINATE_DUPLICATE_FUNCTIONS_PASSES to 1 in settings.js or on the Emscripten command line. Generated asm.js is broken into several batches (at function boundaries) to enable parallelization of the elimination. This saves on memory and makes use of more CPU cores to save on build time. A number of tests have been introduced to test this functionality as well. The change also introduces various tweaks to the amount of diagnostic information that is dumped out by the JavaScript optimizer. Verbose logging is now only enabled in debug mode (via the EMCC_LOG_DEBUG environment variable). We also dump backtraces on encountering unhandled exceptions: this is useful when Emscripten runs as part of a large build process. In order to view detailed information about which functions were merged, set the ELIMINATE_DUPLICATE_FUNCTIONS_DUMP_EQUIVALENT_FUNCTIONS value to 1 in settings.js or via the Emscripten command line. This generates a log file in the same directory as the generated JavaScript listing the sets of merged functions. This can be decoded using the symbol map generated by Emscripten. It is, therefore, recommended that developers enable symbol map generation when attempting to modify or debug this feature. Since DFE increases build time significantly, it is disabled by default. It can be enabled by setting ELIMINATE_DUPLICATE_FUNCTIONS to 1 either in settings.js or by adding "-s ELIMINATE_DUPLICATE_FUNCTIONS=1" on the Emscripten command line. The poppler test has been updated to also run with the ELIMINATE_DUPLICATE_FUNCTIONS setting set to 1. Improvements/future work It has been observed that on average we experience a code size reduction of 25% when transpiling large C++ code bases. Typically, C++ code that makes heavy use of templates will experience the greatest reduction in code size. There are several directions that future work might take: * Deduplication of code across templates: e.g. reduction of std::vector<long> and std::vector<int> to single instantiations of template code when appropriate * Histogram-based selection of candidates for replacement: improved code size should be attainable by assigning the shortest identifiers to the most frequently referenced functions (in the style of Huffman coding) * Convergence: the five-pass default chosen in this implementation is based on empirical observations on a 150,000LOC C++ code base * Candidate selection: this will, most likely, influence both the convergence time (i.e. number of passes) and the code size reduction; currently, when selecting candidates, we choose the shortest identifier from the list that is not also a variable name
1 parent e9d06af commit 98b4cf4

32 files changed

+1603
-30
lines changed

AUTHORS

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -232,4 +232,6 @@ a license to everyone to use it as detailed in LICENSE.)
232232
* Nick Shin <[email protected]>
233233
* Gregg Tavares <[email protected]>
234234
* Tanner Rogalsky <[email protected]>
235-
235+
* Richard Cook <[email protected]> (copyright owned by Tableau Software, Inc.)
236+
* Arnab Choudhury <[email protected]> (copyright owned by Tableau Software, Inc.)
237+
* Charles Vaughn <[email protected]> (copyright owned by Tableau Software, Inc.)

emcc.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1634,6 +1634,11 @@ def do_minify(): # minifies the code. this is also when we do certain optimizati
16341634
else:
16351635
JSOptimizer.queue += ['registerize']
16361636

1637+
# NOTE: Important that this comes after registerize/registerizeHarder
1638+
if shared.Settings.ELIMINATE_DUPLICATE_FUNCTIONS and opt_level >= 2:
1639+
JSOptimizer.flush()
1640+
shared.Building.eliminate_duplicate_funcs(final)
1641+
16371642
if not shared.Settings.EMTERPRETIFY:
16381643
do_minify()
16391644

src/settings.js

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -681,4 +681,9 @@ var PTHREADS_PROFILING = 0; // True when building with --threadprofiler
681681

682682
var MAX_GLOBAL_ALIGN = -1; // received from the backend
683683

684+
// Duplicate function elimination
685+
var ELIMINATE_DUPLICATE_FUNCTIONS = 0; // disabled by default
686+
var ELIMINATE_DUPLICATE_FUNCTIONS_PASSES = 5;
687+
var ELIMINATE_DUPLICATE_FUNCTIONS_DUMP_EQUIVALENT_FUNCTIONS = 0;
688+
684689
// Reserved: variables containing POINTER_MASKING.
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
// EMSCRIPTEN_START_ASM
2+
var asm = (function(global, env, buffer) {
3+
"use asm";
4+
var e = 0;
5+
6+
// EMSCRIPTEN_START_FUNCS
7+
function a() {
8+
var c = 0.0;
9+
return 0;
10+
}
11+
// EMSCRIPTEN_END_FUNCS
12+
var f = 0;
13+
})(Module.asmGlobalArg, Module.asmLibraryArg, buffer);
14+
// EMSCRIPTEN_END_ASM
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
// EMSCRIPTEN_START_ASM
2+
var asm = (function(global, env, buffer) {
3+
"use asm";
4+
var e = 0;
5+
6+
// EMSCRIPTEN_START_FUNCS
7+
function a() {
8+
var c = +0;
9+
return 0;
10+
}
11+
function b() {
12+
var c = +0;
13+
return 0;
14+
}
15+
// EMSCRIPTEN_END_FUNCS
16+
var f = 0;
17+
})(Module.asmGlobalArg, Module.asmLibraryArg, buffer);
18+
// EMSCRIPTEN_END_ASM
19+
// EMSCRIPTEN_GENERATED_FUNCTIONS
20+
21+
22+
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
// EMSCRIPTEN_START_ASM
2+
var asm = (function(global, env, buffer) {
3+
"use asm";
4+
5+
// EMSCRIPTEN_START_FUNCS
6+
function d() {
7+
a();
8+
e();
9+
return;
10+
}
11+
12+
function c() {
13+
a();
14+
return;
15+
}
16+
17+
function a() {
18+
return 0;
19+
}
20+
21+
// EMSCRIPTEN_END_FUNCS
22+
23+
var f = [ a ];
24+
})(Module.asmGlobalArg, Module.asmLibraryArg, buffer);
25+
// EMSCRIPTEN_END_ASM
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
var asm = (function(global, env, buffer) {
2+
"use asm";
3+
function a()
4+
{
5+
return 0;
6+
}
7+
8+
function b()
9+
{
10+
return 0;
11+
}
12+
13+
function c()
14+
{
15+
a();
16+
return;
17+
}
18+
19+
function d()
20+
{
21+
b();
22+
23+
// We expect that b gets replaced by a below
24+
var f = [b];
25+
e();
26+
27+
return;
28+
}
29+
30+
})(Module.asmGlobalArg, Module.asmLibraryArg, buffer);
31+
32+
// {"b":"a"}
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
// EMSCRIPTEN_START_ASM
2+
var asm = (function(global, env, buffer) {
3+
"use asm";
4+
// EMSCRIPTEN_START_FUNCS
5+
function a() {
6+
return 0;
7+
}
8+
function b() {
9+
return 0;
10+
}
11+
function c() {
12+
a();
13+
return;
14+
}
15+
function d() {
16+
b();
17+
e();
18+
return;
19+
}
20+
// EMSCRIPTEN_END_FUNCS
21+
var f = [ b ];
22+
})(Module.asmGlobalArg, Module.asmLibraryArg, buffer);
23+
// EMSCRIPTEN_END_ASM
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
var asm = (function(global, env, buffer) {
2+
"use asm";
3+
function a()
4+
{
5+
return 0;
6+
}
7+
8+
function c()
9+
{
10+
a();
11+
return;
12+
}
13+
14+
function d()
15+
{
16+
a();
17+
return;
18+
}
19+
20+
})(Module.asmGlobalArg, Module.asmLibraryArg, buffer);
21+
22+
// {"d":"c"}
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
var asm = (function(global, env, buffer) {
2+
"use asm";
3+
function a() {
4+
return 0;
5+
}
6+
function c() {
7+
a();
8+
return;
9+
}
10+
function d() {
11+
a();
12+
return;
13+
}
14+
})(Module.asmGlobalArg, Module.asmLibraryArg, buffer);
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
var asm = (function(global, env, buffer) {
2+
"use asm";
3+
function a() {
4+
return 0;
5+
}
6+
function c() {
7+
a();
8+
return;
9+
}
10+
})(Module.asmGlobalArg, Module.asmLibraryArg, buffer);
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
var asm = (function(global, env, buffer) {
2+
"use asm";
3+
function a()
4+
{
5+
return 0;
6+
}
7+
8+
function b()
9+
{
10+
return 0;
11+
}
12+
13+
function c()
14+
{
15+
a();
16+
return;
17+
}
18+
19+
function d()
20+
{
21+
b();
22+
return;
23+
}
24+
25+
})(Module.asmGlobalArg, Module.asmLibraryArg, buffer);
26+
27+
// {"b":"a"}
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
var asm = (function(global, env, buffer) {
2+
"use asm";
3+
function a() {
4+
return 0;
5+
}
6+
function b() {
7+
return 0;
8+
}
9+
function c() {
10+
a();
11+
return;
12+
}
13+
function d() {
14+
b();
15+
return;
16+
}
17+
})(Module.asmGlobalArg, Module.asmLibraryArg, buffer);
18+
19+
20+
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
var asm = (function(global, env, buffer) {
2+
"use asm";
3+
function a() {
4+
return 0;
5+
}
6+
function c() {
7+
a();
8+
return;
9+
}
10+
function d() {
11+
a();
12+
var f = {
13+
g: a
14+
};
15+
e();
16+
return;
17+
}
18+
})(Module.asmGlobalArg, Module.asmLibraryArg, buffer);
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
var asm = (function(global, env, buffer) {
2+
"use asm";
3+
function a()
4+
{
5+
return 0;
6+
}
7+
8+
function b()
9+
{
10+
return 0;
11+
}
12+
13+
function c()
14+
{
15+
a();
16+
return;
17+
}
18+
19+
function d()
20+
{
21+
b();
22+
23+
// We expect that b gets replaced by a below
24+
var f = {
25+
g: b
26+
};
27+
e();
28+
29+
return;
30+
}
31+
32+
})(Module.asmGlobalArg, Module.asmLibraryArg, buffer);
33+
34+
// {"b":"a"}
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
var asm = (function(global, env, buffer) {
2+
"use asm";
3+
function a() {
4+
return 0;
5+
}
6+
function b() {
7+
return 0;
8+
}
9+
function c() {
10+
a();
11+
return;
12+
}
13+
function d() {
14+
b();
15+
var f = {
16+
g: b
17+
};
18+
e();
19+
return;
20+
}
21+
})(Module.asmGlobalArg, Module.asmLibraryArg, buffer);
22+
23+
24+
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
var asm = (function(global, env, buffer) {
2+
"use asm";
3+
function a() {
4+
return 0;
5+
}
6+
function c() {
7+
a();
8+
return;
9+
}
10+
function d() {
11+
a();
12+
var e = a;
13+
e();
14+
return;
15+
}
16+
})(Module.asmGlobalArg, Module.asmLibraryArg, buffer);
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
var asm = (function(global, env, buffer) {
2+
"use asm";
3+
function a()
4+
{
5+
return 0;
6+
}
7+
8+
function b()
9+
{
10+
return 0;
11+
}
12+
13+
function c()
14+
{
15+
a();
16+
return;
17+
}
18+
19+
function d()
20+
{
21+
b();
22+
23+
// We expect that b gets replaced by a below
24+
var e = b;
25+
e();
26+
27+
return;
28+
}
29+
30+
})(Module.asmGlobalArg, Module.asmLibraryArg, buffer);
31+
32+
// {"b" : "a"}

0 commit comments

Comments
 (0)