Skip to content

Transformer with NodeFactory.update* removes parent/symbol information from AST branch #43407

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Perryvw opened this issue Mar 28, 2021 · 5 comments
Labels
Question An issue which isn't directly actionable in code

Comments

@Perryvw
Copy link

Perryvw commented Mar 28, 2021

Bug Report

I am not sure if this is intended behavior or a bug, but it is surprising: after updating a node in the AST using the standard TS transformer API, the node loses its parent/symbol information. Also all parents of this node seem to lose their parent and symbol information, even if they themselves were not transformed.

🔎 Search Terms

NodeFactory visitNode visitEachChild transformer update parent symbol sourceFile undefined compiler API

🕗 Version & Regression Information

  • This seems changed between versions 3.* and 4.*

⏯ Playground Link

Playground link with relevant code

💻 Code

example.ts

function exampleFunction() {
    return true; // To be rewritten to 'return false' by transformer
}

transformation.ts

import * as ts from 'typescript';

// Read example.ts
const program = ts.createProgram(["./example.ts"], { target: ts.ScriptTarget.ESNext });
const sourceFile = program.getSourceFiles().find(f => f.fileName.endsWith("example.ts"));
const typeChecker = program.getTypeChecker(); // Get type checker to do binding and initialize symbols

const functionDecl = sourceFile.statements[0] as ts.FunctionDeclaration;
const functionBody = functionDecl.body!;
const returnStatement = functionBody.statements[0];

console.log(returnStatement.parent === functionBody); // True
console.log(functionBody.parent === functionDecl); // True
console.log(functionDecl.parent === sourceFile); // True

console.log(functionDecl.getSourceFile() !== undefined); // True
console.log(typeChecker.getSymbolAtLocation(functionDecl) !== undefined); // True

// Now we update the source file by rewriting 'return true' to 'return false'
const transformerFactory: ts.TransformerFactory<ts.SourceFile> = context => {
    function visit(node: ts.Node): ts.Node {
        // ONLY update return statement
        if (ts.isReturnStatement(node)) {
            return ts.factory.updateReturnStatement(node, ts.factory.createFalse());
        }

        return ts.visitEachChild(node, visit, context)
    }
    return (sourceFile: ts.SourceFile) => ts.visitNode(sourceFile, visit);
}

const result = ts.transform(sourceFile, [transformerFactory]);

const updatedSourceFile = result.transformed[0];
const updatedFunctionDecl = updatedSourceFile.statements[0] as ts.FunctionDeclaration;

// Parent/sourceFile/symbol are all undefined
console.log(updatedFunctionDecl.parent !== undefined); // False
console.log(updatedFunctionDecl.getSourceFile() !== undefined); // False: Caused by parent being undefined
console.log(typeChecker.getSymbolAtLocation(updatedFunctionDecl) !== undefined); // False

🙁 Actual behavior

After updating the return statement, all nodes in that branch (traversing up to root) of the AST have their parent/symbol information removed.

🙂 Expected behavior

I would expect the AST nodes to still have their parent nodes set, as they are still in an AST. Initially I also expected only the updated node to be changed and not to affect parent/symbol of its parent nodes - though I am no longer sure about that one.

@RyanCavanaugh RyanCavanaugh added the Needs Investigation This issue needs a team member to investigate its status. label Mar 29, 2021
@RyanCavanaugh
Copy link
Member

I don't believe parent preservation is an invariant of tree transforms. @rbuckton can clarify

@rbuckton
Copy link
Contributor

The TypeScript AST is essentially immutable. What happens when you call updateReturnStatement is that we create a new ReturnStatement with the provided expression (assuming the expression is a different reference from the original):

const expr1 = ts.factory.createNumericLiteral(1);
const returnStmt1 = ts.factory.createReturnStatement(expr1);
const returnStmt2 = ts.factory.updateReturnStatement(returnStmt1, expr1); // same expression
console.log(returnStmt1 === returnStmt2); // true

const expr2 = ts.factory.createNumericLiteral(2);
const returnStmt3 = ts.factory.updateReturnStatement(returnStmt1, expr2); // different expression
console.log(returnStmt1 === returnStmt3); // false

Whenever we create an "updated" node, we set a property on that node called original that points to the previous node the node was updated from. The "updated" node will not have its parent set, and will not have symbol information.

We only maintain parents and symbols on "parse tree" nodes (i.e., nodes created by the parser). If you pass an "updated" node into a checker API, we will automatically follow all original pointers back to the source node. If passing an "updated" (or other synthetic node) to the checker API does not give you a result, there are two possible reasons:

  1. The node you provided is not a "parse tree" node, or does not have an original pointer that when recursively followed eventually points to a "parse tree" node.
  2. A bug in the checker surface API (infrequent, but possible)

If you create a purely synthetic node and want it to participate in the checker API, you can explicitly set the original node using the ts.setOriginalNode function.

If you want to find the "parse tree" node for a given synthetic node, you can call ts.getParseTreeNode. This function will follow the original pointers of the provided node until it finds a "parse tree" node to return, or will return undefined if no such node was found.

Finally, we don't change or set parent on synthetic/transform nodes because our transformers must maintain several invariants:

  • We should never change the parent of a "parse tree" node, as the checker defers some operations that might utilize that during the emit phase, and "parse tree" nodes can be reused in incremental parse scenarios.
  • For performance reasons, a transformer may reuse an entire subtree, which could include "parse tree" nodes.

As a result, we caution anyone from relying on parent for any reason in a transformer. The only time it is safe to use is when working with "parse tree" nodes produced by the parser (and even then, only if the source file was bound by the binder, or setParentNodes was set to true when calling ts.createSourceFile).

@rbuckton
Copy link
Contributor

I find this part of your example surprising however:

console.log(typeChecker.getSymbolAtLocation(updatedFunctionDecl) !== undefined); // False

That should have worked just fine, because the entrypoint for getSymbolAtLocation looks like this:

            getSymbolAtLocation: nodeIn => {
                const node = getParseTreeNode(nodeIn);
                // set ignoreErrors: true because any lookups invoked by the API shouldn't cause any new errors
                return node ? getSymbolAtLocation(node, /*ignoreErrors*/ true) : undefined;
            },

So, getParseTreeNode(updatedFunctionDecl) should point to functionDecl, and should produce the same result when you pass it to getSymbolAtLocation. If it isn't, then there's a bug somewhere.

One final note: You shouldn't rely on ts.factory in a transformer. Every transformer is provided a factory property on its context object that should be used instead. This is because I eventually intend to add some tracing to transformers to indicate which transformer created a node to help with diagnosing issues with transformations. This only works if you use the factory provided to the transformer, since that will eventually be a unique instance:

const transformerFactory: ts.TransformerFactory<ts.SourceFile> = context => {
    const { factory } = context;
    function visit(node: ts.Node): ts.Node {
        // ONLY update return statement
        if (ts.isReturnStatement(node)) {
            return factory.updateReturnStatement(node, factory.createFalse());
        }

        return ts.visitEachChild(node, visit, context)
    }
    return (sourceFile: ts.SourceFile) => ts.visitNode(sourceFile, visit);
}

@RyanCavanaugh RyanCavanaugh added Question An issue which isn't directly actionable in code and removed Needs Investigation This issue needs a team member to investigate its status. labels Mar 29, 2021
@Perryvw
Copy link
Author

Perryvw commented Mar 30, 2021

@rbuckton Thanks for the clear answer!

I checked out ts.getParseTreeNode(updatedFunctionDecl) === functionDecl which is true as you indicated. I guess this means we will be changing our node.getSourceFile() to ts.getParseTreeNode(node).getSourceFile()!

Also had a quick look at why the symbol is undefined, but it also is undefined for the original functionDecl, so that seems to be just a problem in my example setup, getSymbolAtLocation at least seems consistent.

@Perryvw Perryvw closed this as completed Mar 30, 2021
@ericbiewener
Copy link

@rbuckton Can you clarify why you caution against using parent in a transformer? I have written a transformer that removes empty object type literals from intersections or unions:

if (
  isTypeLiteral(node) &&
  !node.members.length &&
  isIntersectionOrUnion(node.parent || ts.getParseTreeNode(node)?.parent)
) {
  return undefined;
}

Should I instead visit the intersection/union parent and then remove children that are empty type literals? I don't actually know how to do that:

if (isIntersectionOrUnion(node)) {
  node.types.forEach(() => ???)
  // or
  ts.visitEachChild(...) // Doesn't appear that the nodes contained in the intersection/union are actually children -- they don't get visited.
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Question An issue which isn't directly actionable in code
Projects
None yet
Development

No branches or pull requests

4 participants