Skip to content

[clang-doc] Improve clang-doc performance through memoization #96809

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jul 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 35 additions & 9 deletions clang-tools-extra/clang-doc/Mapper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,28 @@
#include "clang/AST/Comment.h"
#include "clang/Index/USRGeneration.h"
#include "llvm/ADT/StringExtras.h"
#include "llvm/Support/Error.h"
#include "llvm/ADT/StringSet.h"
#include "llvm/Support/Mutex.h"

namespace clang {
namespace doc {

static llvm::StringSet<> USRVisited;
static llvm::sys::Mutex USRVisitedGuard;

template <typename T> bool isTypedefAnonRecord(const T *D) {
if (const auto *C = dyn_cast<CXXRecordDecl>(D)) {
return C->getTypedefNameForAnonDecl();
}
return false;
}

void MapASTVisitor::HandleTranslationUnit(ASTContext &Context) {
TraverseDecl(Context.getTranslationUnitDecl());
}

template <typename T> bool MapASTVisitor::mapDecl(const T *D) {
template <typename T>
bool MapASTVisitor::mapDecl(const T *D, bool IsDefinition) {
// If we're looking a decl not in user files, skip this decl.
if (D->getASTContext().getSourceManager().isInSystemHeader(D->getLocation()))
return true;
Expand All @@ -34,6 +46,16 @@ template <typename T> bool MapASTVisitor::mapDecl(const T *D) {
// If there is an error generating a USR for the decl, skip this decl.
if (index::generateUSRForDecl(D, USR))
return true;
// Prevent Visiting USR twice
{
std::lock_guard<llvm::sys::Mutex> Guard(USRVisitedGuard);
StringRef Visited = USR.str();
if (USRVisited.count(Visited) && !isTypedefAnonRecord<T>(D))
return true;
// We considered a USR to be visited only when its defined
if (IsDefinition)
USRVisited.insert(Visited);
Comment on lines +55 to +57
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this logic makes sense. Why is it unreasonable to expect a USR to be valid only when its a definition? Is there some logic that prevents things that aren't definitions from being able to hold documentation? What I'm worried about, is the case where we have several USRs that contain different documentation bits that would have been merged in the past, but now wont.

If there is a concrete reason, please document that here in the comments, and in the commit message.

Copy link
Contributor Author

@PeterChou1 PeterChou1 Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My logic was that definition of the USR is parsed last, so when the ASTVisitor visits the definition it would have already parsed every other USR that points to the same declaration. So we can safely short circuit, since every other fragments of USR would've been parsed already

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My logic was that definition of the USR is parsed last, so when the ASTVisitor visits the definition it would have already parsed every other USR that points to the same declaration. So we can safely short circuit, since every other fragments of USR would've been parsed already

Is it really parsed last in all cases? Isn't it possible to have multiple of these definitions, depending on the scope of the AST construct and the target options. For instance, what if some code is compiled w/ a particular #define enabled and that provides different documentation than was found previously, but is compiled elsewhere without that define? I can easily imagine that affecting header code, where documentation is likely to be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I hadn't considered that this was my clumsy attempt at trying to remove the redundant work done by the ASTVisitor. Since we are undeniably doing some redundant work. if you take a look at the Shape class from the e2e test you'll see that we visit the declaration 3 times once for parsing the initial file and the twice more for each subclass. Is there any other mechanism that prevents this type of behaviour? This essentially what this patch is trying to accomplish

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is why we need more tests that exercise a diverse set of behavior. Some of the different targets should control the documentation bits w/ defines, and we should also make sure that we're covering all the ways a USR could be brought in. For instance, if there was only one base class that brought in Shape, I'm not sure how much better you could do, since I'd assume it would only be included once, and then parsed in full the one time.

For more complicated arrangements, we'd need to be sure the definitions, and the documentation would be resolved the same way in all cases. I'm not aware of a mechanism that does that off the top of my head, but I would expect the clangd indexing mechanic probably has a way to handle this on some level. I'd also expect that the logic in scan-deps must have some mechanism for not processing files multiple times.

I'm not confident enough to say what you're proposing is outright wrong, but I'm also not confident that its correct, either. What I'm saying is that we need to be sure, that when we see a definition in clang-doc that it won't change from out beneath us. That said, given the merge logic in the reduction phase, perhaps if the USR is complete(i.e. has no missing fields), the memoization is sufficient.

}
bool IsFileInRootDir;
llvm::SmallString<128> File =
getFile(D, D->getASTContext(), CDCtx.SourceRoot, IsFileInRootDir);
Expand All @@ -53,30 +75,34 @@ template <typename T> bool MapASTVisitor::mapDecl(const T *D) {
}

bool MapASTVisitor::VisitNamespaceDecl(const NamespaceDecl *D) {
return mapDecl(D);
return mapDecl(D, /*isDefinition=*/true);
}

bool MapASTVisitor::VisitRecordDecl(const RecordDecl *D) { return mapDecl(D); }
bool MapASTVisitor::VisitRecordDecl(const RecordDecl *D) {
return mapDecl(D, D->isThisDeclarationADefinition());
}

bool MapASTVisitor::VisitEnumDecl(const EnumDecl *D) { return mapDecl(D); }
bool MapASTVisitor::VisitEnumDecl(const EnumDecl *D) {
return mapDecl(D, D->isThisDeclarationADefinition());
}

bool MapASTVisitor::VisitCXXMethodDecl(const CXXMethodDecl *D) {
return mapDecl(D);
return mapDecl(D, D->isThisDeclarationADefinition());
}

bool MapASTVisitor::VisitFunctionDecl(const FunctionDecl *D) {
// Don't visit CXXMethodDecls twice
if (isa<CXXMethodDecl>(D))
return true;
return mapDecl(D);
return mapDecl(D, D->isThisDeclarationADefinition());
}

bool MapASTVisitor::VisitTypedefDecl(const TypedefDecl *D) {
return mapDecl(D);
return mapDecl(D, /*isDefinition=*/true);
}

bool MapASTVisitor::VisitTypeAliasDecl(const TypeAliasDecl *D) {
return mapDecl(D);
return mapDecl(D, /*isDefinition=*/true);
}

comments::FullComment *
Expand Down
2 changes: 1 addition & 1 deletion clang-tools-extra/clang-doc/Mapper.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ class MapASTVisitor : public clang::RecursiveASTVisitor<MapASTVisitor>,
bool VisitTypeAliasDecl(const TypeAliasDecl *D);

private:
template <typename T> bool mapDecl(const T *D);
template <typename T> bool mapDecl(const T *D, bool IsDefinition);

int getLine(const NamedDecl *D, const ASTContext &Context) const;
llvm::SmallString<128> getFile(const NamedDecl *D, const ASTContext &Context,
Expand Down
1 change: 0 additions & 1 deletion clang-tools-extra/clang-doc/tool/ClangDocMain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -303,7 +303,6 @@ Example usage for a project using a compile commands database:
for (auto &Group : USRToBitcode) {
Pool.async([&]() {
std::vector<std::unique_ptr<doc::Info>> Infos;

for (auto &Bitcode : Group.getValue()) {
llvm::BitstreamCursor Stream(Bitcode);
doc::ClangDocBitcodeReader Reader(Stream);
Expand Down