Skip to content

[CIR] Build out AST consumer patterns to reach the entry point into CIRGen #91007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

lanza
Copy link
Member

@lanza lanza commented May 3, 2024

Build out the necessary infrastructure for the main entry point into
ClangIR generation -- CIRGenModule. A set of boilerplate classes exist
to facilitate this -- CIRGenerator, CIRGenAction, EmitCIRAction and
CIRGenConsumer. These all mirror the corresponding types from LLVM
generation by Clang's CodeGen.

The main entry point to CIR generation is
CIRGenModule::buildTopLevelDecl. It is currently just an empty
function. We've added a test to ensure that the pipeline reaches this
point and doesn't fail, but does nothing else. This will be removed in
one of the subsequent patches that'll add basic cir.func emission.

This patch also re-adds -emit-cir to the driver. lib/Driver/Driver.cpp
requires that a driver flag exists to facilirate the selection of the
right actions for the driver to create. Without a driver flag you get
the standard behaviors of -S, -c, etc. If we want to emit CIR IR
and, eventually, bytecode we'll need a driver flag to force this. This
is why -emit-llvm is a driver flag. Notably, -emit-llvm-bc as a cc1
flag doesn't ever do the right thing. Without a driver flag it is
incorrectly ignored and an executable is emitted. With -S a file named
something.s is emitted which actually contains bitcode.

Created using spr 1.3.5
@lanza lanza requested a review from bcardosolopes as a code owner May 3, 2024 20:19
@@ -2900,7 +2900,7 @@ defm clangir : BoolFOption<"clangir",
PosFlag<SetTrue, [], [ClangOption, CC1Option], "Use the ClangIR pipeline to compile">,
NegFlag<SetFalse, [], [ClangOption, CC1Option], "Use the AST -> LLVM pipeline to compile">,
BothFlags<[], [ClangOption, CC1Option], "">>;
def emit_cir : Flag<["-"], "emit-cir">, Visibility<[CC1Option]>,
def emit_cir : Flag<["-"], "emit-cir">, Visibility<[ClangOption, CC1Option]>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erichkeane @AaronBallman @MaskRay

I had to re-add the ClangOption here. lib/Driver/Driver.cpp builds the Action framework out based on input to the driver. So to properly orchestrate the outputs you need a driver flag to tell it that, e.g., we are emitting a .cir textual output. e.g. -fsyntax-only exists to chop off the frontend at the right point and -emit-llvm seems intentionally to have been made a driver flag to orchestrate .ll or .bc output.

Notably, -emit-llvm-bc does the wrong thing in all cases. clang -Xclang -emit-llvm-bc hello.c does the wrong thing and emits an executable. clang -Xclang -emit-llvm -S hello.c also does the wrong thing and emits a file named hello.s that is actually llvm bitcode.

Copy link
Member

@MaskRay MaskRay May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notably, -emit-llvm-bc does the wrong thing in all cases. clang -Xclang -emit-llvm-bc hello.c does the wrong thing and emits an executable. clang -Xclang -emit-llvm -S hello.c also does the wrong thing and emits a file named hello.s that is actually llvm bitcode.

The description of #91140 might be useful.

You need -S or -c to disable linking. Then -c -Xclang -emit-cir can be used to override the assumed action passed by clangDriver to cc1.

-flto modifies -c to emit a bitcode file and -S to emit a textual IR file. -fcir can be modeled like -flto.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then -c -Xclang -emit-cir can be used to override the assumed action passed by clangDriver to cc1.

It would be incorrect to only use -Xclang -emit-cir -c as it would output a .o file for a textual .cir file. I've plugged in ClangIR to the action building system. -c requests actions through the assembler. We're stopping with ClangIR via -emit-cir after the Compile phase.

-flto modifies -c to emit a bitcode file and -S to emit a textual IR file. -fcir can be modeled like -flto.

That's an inaccurate comparison. -fcir is supposed to only change from using codegen to cirgen&lowering. The output shouldn't change. -emit-cir is supposed to emit a file with the suffix .cir. The driver has to be told the preferred output type to tell the action system to emit a .cir suffixed file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaskRay are you okay with this approach? Also, if this is exposed as a driver flag, should we have a test in clang/test/Driver that it emits the expected -cc1 option?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fangrui is out for a few weeks: #104899 (review)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To restate the argument here in favor of using a -emit-cir frontend flag:

  • This line shows the chunk of driver code responsible for mapping the requested emission to the output file. Without -emit-cir as a frontend flag we instead get .o, .s or some other incorrect output based on the -c, -S flags you pass to the frontend.
  • @MaskRay previously argued that we should use -fclangir to change the output. But this flag is orthogonal to the output. It merely enabled the ClangIR pipeline and, by design, does not change the output. We need some flag to tell the driver what the output kind is supposed to be.

/// A "module" matches a c/cpp source file: containing a list of functions.
mlir::ModuleOp theModule;

[[maybe_unused]] clang::DiagnosticsEngine &Diags;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, would be nice to only have to bring these in when necessary.

@@ -42,6 +47,14 @@ CreateFrontendBaseAction(CompilerInstance &CI) {
StringRef Action("unknown");
(void)Action;

auto UseCIR = CI.getFrontendOpts().UseClangIRPipeline;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coding standard doesn't allow use of 'auto' here, only when the actual type of the variable is spelled on the RHS side (see next 2 lines too).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be the case for actual lib/CIR/CodeGen too? MLIR uses auto extensively (perhaps because value semantics is common land there) and we have been doing that as well for CIRGen and passes - it's going to be a massive change for us, so we it'd be good to know if that should apply everywhere or only into the areas of Clang isolated from MLIR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a question I think for @AaronBallman . This DOES live in the 'clang' directory, so I would think this should all follow the Clang coding standards, but I can see the justification for using the MLIR standard, same as the capitalization (@lanza ).

There WAS some discussion elsewhere, but I cant find the thread, so I dont know what was agreed upon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the community conventions in this case because this is materially different from naming. With naming, it's about style and while it's quite distracting to have a unique style, it's doesn't really hurt understanding the code. With use of auto, it's about being able to understand the code for people not familiar with it; using auto in this way makes the code harder for the community to maintain in ways that naming conventions don't really have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's a different discussion than naming. @joker-eph: what's the auto policy for MLIR, is it different from LLVM/Clang? I'm also having trouble to find the previous discussions, let me search a bit more.

Copy link
Collaborator

@joker-eph joker-eph May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MLIR isn't meant to differ from LLVM/Clang on auto.
(See https://mlir.llvm.org/getting_started/DeveloperGuide/#style-guide for the few things we document as diverging).

Coding standard doesn't allow use of 'auto' here, only when the actual type of the variable is spelled on the RHS side (see next 2 lines too).

Nit: technically the coding standard does not say that, I believe you're mentioning a sufficient condition, not a necessary one, see https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable

Use auto if and only if it makes the code more readable or easier to maintain.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: technically the coding standard does not say that, I believe you're mentioning a sufficient condition, not a necessary one, see https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable

Use auto if and only if it makes the code more readable or easier to maintain.

Yup. FWIW, the rule of thumb we use in Clang is that "readable" means "type is spelled out in the initialization somewhere or is otherwise painful to spell but is contextually quite obvious (e.g., use of iterators)" + "any time the code reviewer asks to switch away from auto"

@bcardosolopes bcardosolopes added the ClangIR Anything related to the ClangIR project label May 3, 2024
@llvmbot
Copy link
Member

llvmbot commented May 4, 2024

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clangir

Author: Nathan Lanza (lanza)

Changes

Build out the necessary infrastructure for the main entry point into
ClangIR generation -- CIRGenModule. A set of boilerplate classes exist
to facilitate this -- CIRGenerator, CIRGenAction, EmitCIRAction and
CIRGenConsumer. These all mirror the corresponding types from LLVM
generation by Clang's CodeGen.

The main entry point to CIR generation is
CIRGenModule::buildTopLevelDecl. It is currently just an empty
function. We've added a test to ensure that the pipeline reaches this
point and doesn't fail, but does nothing else. This will be removed in
one of the subsequent patches that'll add basic cir.func emission.

This patch also re-adds -emit-cir to the driver. lib/Driver/Driver.cpp
requires that a driver flag exists to facilirate the selection of the
right actions for the driver to create. Without a driver flag you get
the standard behaviors of -S, -c, etc. If we want to emit CIR IR
and, eventually, bytecode we'll need a driver flag to force this. This
is why -emit-llvm is a driver flag. Notably, -emit-llvm-bc as a cc1
flag doesn't ever do the right thing. Without a driver flag it is
incorrectly ignored and an executable is emitted. With -S a file named
something.s is emitted which actually contains bitcode.


Full diff: https://github.com/llvm/llvm-project/pull/91007.diff

16 Files Affected:

  • (added) clang/include/clang/CIR/CIRGenerator.h (+59)
  • (added) clang/include/clang/CIRFrontendAction/CIRGenAction.h (+61)
  • (modified) clang/include/clang/Driver/Options.td (+1-1)
  • (modified) clang/lib/CIR/CMakeLists.txt (+2)
  • (added) clang/lib/CIR/CodeGen/CIRGenModule.cpp (+35)
  • (added) clang/lib/CIR/CodeGen/CIRGenModule.h (+61)
  • (added) clang/lib/CIR/CodeGen/CIRGenTypeCache.h (+27)
  • (added) clang/lib/CIR/CodeGen/CIRGenerator.cpp (+46)
  • (added) clang/lib/CIR/CodeGen/CMakeLists.txt (+23)
  • (added) clang/lib/CIR/FrontendAction/CIRGenAction.cpp (+88)
  • (added) clang/lib/CIR/FrontendAction/CMakeLists.txt (+17)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+3)
  • (modified) clang/lib/FrontendTool/CMakeLists.txt (+15)
  • (modified) clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp (+18)
  • (added) clang/test/CIR/hello.c (+4)
  • (added) clang/test/CIR/lit.local.cfg (+2)
diff --git a/clang/include/clang/CIR/CIRGenerator.h b/clang/include/clang/CIR/CIRGenerator.h
new file mode 100644
index 00000000000000..c9505d2473b437
--- /dev/null
+++ b/clang/include/clang/CIR/CIRGenerator.h
@@ -0,0 +1,59 @@
+//===- CIRGenerator.h - CIR Generation from Clang AST ---------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file declares a simple interface to perform CIR generation from Clang
+// AST
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef CLANG_CIRGENERATOR_H_
+#define CLANG_CIRGENERATOR_H_
+
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/DeclGroup.h"
+#include "clang/Basic/CodeGenOptions.h"
+#include "clang/Basic/Diagnostic.h"
+
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/Support/VirtualFileSystem.h"
+
+#include <memory>
+
+namespace mlir {
+class MLIRContext;
+} // namespace mlir
+namespace cir {
+class CIRGenModule;
+
+class CIRGenerator : public clang::ASTConsumer {
+  virtual void anchor();
+  clang::DiagnosticsEngine &Diags;
+  clang::ASTContext *astCtx;
+  llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>
+      fs; // Only used for debug info.
+
+  const clang::CodeGenOptions codeGenOpts; // Intentionally copied in.
+
+  [[maybe_unused]] unsigned HandlingTopLevelDecls;
+
+protected:
+  std::unique_ptr<mlir::MLIRContext> mlirCtx;
+  std::unique_ptr<CIRGenModule> CGM;
+
+public:
+  CIRGenerator(clang::DiagnosticsEngine &diags,
+               llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem> FS,
+               const clang::CodeGenOptions &CGO);
+  ~CIRGenerator();
+  void Initialize(clang::ASTContext &Context) override;
+  bool HandleTopLevelDecl(clang::DeclGroupRef D) override;
+};
+
+} // namespace cir
+
+#endif // CLANG_CIRGENERATOR_H_
diff --git a/clang/include/clang/CIRFrontendAction/CIRGenAction.h b/clang/include/clang/CIRFrontendAction/CIRGenAction.h
new file mode 100644
index 00000000000000..aaf73ec2bffc47
--- /dev/null
+++ b/clang/include/clang/CIRFrontendAction/CIRGenAction.h
@@ -0,0 +1,61 @@
+//===---- CIRGenAction.h - CIR Code Generation Frontend Action -*- C++ -*--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_CIR_CIRGENACTION_H
+#define LLVM_CLANG_CIR_CIRGENACTION_H
+
+#include "clang/Frontend/FrontendAction.h"
+
+#include "mlir/IR/BuiltinOps.h"
+#include "mlir/IR/OwningOpRef.h"
+
+namespace mlir {
+class MLIRContext;
+class ModuleOp;
+} // namespace mlir
+
+namespace cir {
+class CIRGenConsumer;
+
+class CIRGenAction : public clang::ASTFrontendAction {
+public:
+  enum class OutputType {
+    EmitCIR,
+  };
+
+private:
+  friend class CIRGenConsumer;
+
+  mlir::OwningOpRef<mlir::ModuleOp> mlirModule;
+
+  mlir::MLIRContext *mlirContext;
+
+protected:
+  CIRGenAction(OutputType action, mlir::MLIRContext *mlirContext = nullptr);
+
+  std::unique_ptr<clang::ASTConsumer>
+  CreateASTConsumer(clang::CompilerInstance &CI,
+                    llvm::StringRef InFile) override;
+
+public:
+  ~CIRGenAction() override;
+
+  CIRGenConsumer *cgConsumer;
+  OutputType action;
+};
+
+class EmitCIRAction : public CIRGenAction {
+  virtual void anchor();
+
+public:
+  EmitCIRAction(mlir::MLIRContext *mlirCtx = nullptr);
+};
+
+} // namespace cir
+
+#endif
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 938d5358eeda6b..bd1200f77079ad 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -2900,7 +2900,7 @@ defm clangir : BoolFOption<"clangir",
   PosFlag<SetTrue, [], [ClangOption, CC1Option], "Use the ClangIR pipeline to compile">,
   NegFlag<SetFalse, [], [ClangOption, CC1Option], "Use the AST -> LLVM pipeline to compile">,
   BothFlags<[], [ClangOption, CC1Option], "">>;
-def emit_cir : Flag<["-"], "emit-cir">, Visibility<[CC1Option]>,
+def emit_cir : Flag<["-"], "emit-cir">, Visibility<[ClangOption, CC1Option]>,
   Group<Action_Group>, HelpText<"Build ASTs and then lower to ClangIR">;
 /// ClangIR-specific options - END
 
diff --git a/clang/lib/CIR/CMakeLists.txt b/clang/lib/CIR/CMakeLists.txt
index d2ff200e0da5f5..11cca734808dfa 100644
--- a/clang/lib/CIR/CMakeLists.txt
+++ b/clang/lib/CIR/CMakeLists.txt
@@ -2,3 +2,5 @@ include_directories(${LLVM_MAIN_SRC_DIR}/../mlir/include)
 include_directories(${CMAKE_BINARY_DIR}/tools/mlir/include)
 
 add_subdirectory(Dialect)
+add_subdirectory(CodeGen)
+add_subdirectory(FrontendAction)
diff --git a/clang/lib/CIR/CodeGen/CIRGenModule.cpp b/clang/lib/CIR/CodeGen/CIRGenModule.cpp
new file mode 100644
index 00000000000000..244ab4117c859a
--- /dev/null
+++ b/clang/lib/CIR/CodeGen/CIRGenModule.cpp
@@ -0,0 +1,35 @@
+//===- CIRGenModule.cpp - Per-Module state for CIR generation -------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This is the internal per-translation-unit state used for CIR translation.
+//
+//===----------------------------------------------------------------------===//
+
+#include "CIRGenModule.h"
+
+#include "clang/AST/DeclBase.h"
+
+#include "llvm/Support/Debug.h"
+
+#include "mlir/IR/BuiltinOps.h"
+#include "mlir/IR/Location.h"
+#include "mlir/IR/MLIRContext.h"
+
+using namespace cir;
+CIRGenModule::CIRGenModule(mlir::MLIRContext &context,
+                           clang::ASTContext &astctx,
+                           const clang::CodeGenOptions &CGO,
+                           DiagnosticsEngine &Diags)
+    : astCtx(astctx), langOpts(astctx.getLangOpts()), codeGenOpts(CGO),
+      theModule{mlir::ModuleOp::create(mlir::UnknownLoc())}, Diags(Diags),
+      target(astCtx.getTargetInfo()) {}
+
+CIRGenModule::~CIRGenModule() {}
+
+// Emit code for a single top level declaration.
+void CIRGenModule::buildTopLevelDecl(Decl *decl) {}
diff --git a/clang/lib/CIR/CodeGen/CIRGenModule.h b/clang/lib/CIR/CodeGen/CIRGenModule.h
new file mode 100644
index 00000000000000..e7917be14cfd34
--- /dev/null
+++ b/clang/lib/CIR/CodeGen/CIRGenModule.h
@@ -0,0 +1,61 @@
+//===--- CIRGenModule.h - Per-Module state for CIR gen ----------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This is the internal per-translation-unit state used for CIR translation.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_LIB_CIR_CODEGEN_CIRGENMODULE_H
+#define LLVM_CLANG_LIB_CIR_CODEGEN_CIRGENMODULE_H
+
+#include "CIRGenTypeCache.h"
+
+#include "clang/AST/ASTContext.h"
+#include "clang/Basic/CodeGenOptions.h"
+#include "clang/Basic/Diagnostic.h"
+
+#include "mlir/IR/BuiltinOps.h"
+#include "mlir/IR/MLIRContext.h"
+
+using namespace clang;
+namespace cir {
+
+/// This class organizes the cross-function state that is used while generating
+/// CIR code.
+class CIRGenModule : public CIRGenTypeCache {
+  CIRGenModule(CIRGenModule &) = delete;
+  CIRGenModule &operator=(CIRGenModule &) = delete;
+
+public:
+  CIRGenModule(mlir::MLIRContext &context, clang::ASTContext &astctx,
+               const clang::CodeGenOptions &CGO,
+               clang::DiagnosticsEngine &Diags);
+
+  ~CIRGenModule();
+
+private:
+  /// Hold Clang AST information.
+  clang::ASTContext &astCtx;
+
+  const clang::LangOptions &langOpts;
+
+  [[maybe_unused]] const clang::CodeGenOptions &codeGenOpts;
+
+  /// A "module" matches a c/cpp source file: containing a list of functions.
+  mlir::ModuleOp theModule;
+
+  [[maybe_unused]] clang::DiagnosticsEngine &Diags;
+
+  const clang::TargetInfo &target;
+
+public:
+  void buildTopLevelDecl(clang::Decl *decl);
+};
+} // namespace cir
+
+#endif // LLVM_CLANG_LIB_CIR_CODEGEN_CIRGENMODULE_H
diff --git a/clang/lib/CIR/CodeGen/CIRGenTypeCache.h b/clang/lib/CIR/CodeGen/CIRGenTypeCache.h
new file mode 100644
index 00000000000000..9b865aa20b4fde
--- /dev/null
+++ b/clang/lib/CIR/CodeGen/CIRGenTypeCache.h
@@ -0,0 +1,27 @@
+//===--- CIRGenTypeCache.h - Commonly used LLVM types and info -*- C++ --*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This structure provides a set of common types useful during CIR emission.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_LIB_CIR_CIRGENTYPECACHE_H
+#define LLVM_CLANG_LIB_CIR_CIRGENTYPECACHE_H
+
+namespace cir {
+
+/// This structure provides a set of types that are commonly used
+/// during IR emission. It's initialized once in CodeGenModule's
+/// constructor and then copied around into new CIRGenFunction's.
+struct CIRGenTypeCache {
+  CIRGenTypeCache() {}
+};
+
+} // namespace cir
+
+#endif // LLVM_CLANG_LIB_CIR_CODEGEN_CIRGENTYPECACHE_H
diff --git a/clang/lib/CIR/CodeGen/CIRGenerator.cpp b/clang/lib/CIR/CodeGen/CIRGenerator.cpp
new file mode 100644
index 00000000000000..00f9e97ee13273
--- /dev/null
+++ b/clang/lib/CIR/CodeGen/CIRGenerator.cpp
@@ -0,0 +1,46 @@
+//===--- CIRGenerator.cpp - Emit CIR from ASTs ----------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This builds an AST and converts it to CIR.
+//
+//===----------------------------------------------------------------------===//
+
+#include "CIRGenModule.h"
+
+#include "clang/AST/DeclGroup.h"
+#include "clang/CIR/CIRGenerator.h"
+
+using namespace cir;
+using namespace clang;
+
+void CIRGenerator::anchor() {}
+
+CIRGenerator::CIRGenerator(clang::DiagnosticsEngine &diags,
+                           llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem> vfs,
+                           const CodeGenOptions &CGO)
+    : Diags(diags), fs(std::move(vfs)), codeGenOpts{CGO},
+      HandlingTopLevelDecls(0) {}
+CIRGenerator::~CIRGenerator() {}
+
+void CIRGenerator::Initialize(ASTContext &astCtx) {
+  using namespace llvm;
+
+  this->astCtx = &astCtx;
+
+  CGM = std::make_unique<CIRGenModule>(*mlirCtx.get(), astCtx, codeGenOpts,
+                                       Diags);
+}
+
+bool CIRGenerator::HandleTopLevelDecl(DeclGroupRef D) {
+
+  for (DeclGroupRef::iterator I = D.begin(), E = D.end(); I != E; ++I) {
+    CGM->buildTopLevelDecl(*I);
+  }
+
+  return true;
+}
diff --git a/clang/lib/CIR/CodeGen/CMakeLists.txt b/clang/lib/CIR/CodeGen/CMakeLists.txt
new file mode 100644
index 00000000000000..17a3aabfbd7f0e
--- /dev/null
+++ b/clang/lib/CIR/CodeGen/CMakeLists.txt
@@ -0,0 +1,23 @@
+set(
+  LLVM_LINK_COMPONENTS
+  Core
+  Support
+)
+
+get_property(dialect_libs GLOBAL PROPERTY MLIR_DIALECT_LIBS)
+
+add_clang_library(clangCIR
+  CIRGenerator.cpp
+  CIRGenModule.cpp
+
+  DEPENDS
+  MLIRCIR
+  ${dialect_libs}
+
+  LINK_LIBS
+  clangAST
+  clangBasic
+  clangLex
+  ${dialect_libs}
+  MLIRCIR
+)
diff --git a/clang/lib/CIR/FrontendAction/CIRGenAction.cpp b/clang/lib/CIR/FrontendAction/CIRGenAction.cpp
new file mode 100644
index 00000000000000..efdd109963a19b
--- /dev/null
+++ b/clang/lib/CIR/FrontendAction/CIRGenAction.cpp
@@ -0,0 +1,88 @@
+//===--- CIRGenAction.cpp - LLVM Code generation Frontend Action ---------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/CIRFrontendAction/CIRGenAction.h"
+#include "clang/CIR/CIRGenerator.h"
+#include "clang/Frontend/CompilerInstance.h"
+
+#include "mlir/IR/MLIRContext.h"
+#include "mlir/IR/OwningOpRef.h"
+
+using namespace cir;
+using namespace clang;
+
+namespace cir {
+
+class CIRGenConsumer : public clang::ASTConsumer {
+
+  virtual void anchor();
+
+  [[maybe_unused]] CIRGenAction::OutputType action;
+
+  [[maybe_unused]] DiagnosticsEngine &diagnosticsEngine;
+  [[maybe_unused]] const HeaderSearchOptions &headerSearchOptions;
+  [[maybe_unused]] const CodeGenOptions &codeGenOptions;
+  [[maybe_unused]] const TargetOptions &targetOptions;
+  [[maybe_unused]] const LangOptions &langOptions;
+  [[maybe_unused]] const FrontendOptions &feOptions;
+
+  std::unique_ptr<raw_pwrite_stream> outputStream;
+
+  [[maybe_unused]] ASTContext *astContext{nullptr};
+  IntrusiveRefCntPtr<llvm::vfs::FileSystem> FS;
+  std::unique_ptr<CIRGenerator> gen;
+
+public:
+  CIRGenConsumer(CIRGenAction::OutputType action,
+                 DiagnosticsEngine &diagnosticsEngine,
+                 IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS,
+                 const HeaderSearchOptions &headerSearchOptions,
+                 const CodeGenOptions &codeGenOptions,
+                 const TargetOptions &targetOptions,
+                 const LangOptions &langOptions,
+                 const FrontendOptions &feOptions,
+                 std::unique_ptr<raw_pwrite_stream> os)
+      : action(action), diagnosticsEngine(diagnosticsEngine),
+        headerSearchOptions(headerSearchOptions),
+        codeGenOptions(codeGenOptions), targetOptions(targetOptions),
+        langOptions(langOptions), feOptions(feOptions),
+        outputStream(std::move(os)), FS(VFS),
+        gen(std::make_unique<CIRGenerator>(diagnosticsEngine, std::move(VFS),
+                                           codeGenOptions)) {}
+
+  bool HandleTopLevelDecl(DeclGroupRef D) override {
+    gen->HandleTopLevelDecl(D);
+    return true;
+  }
+};
+} // namespace cir
+
+void CIRGenConsumer::anchor() {}
+
+CIRGenAction::CIRGenAction(OutputType act, mlir::MLIRContext *mlirContext)
+    : mlirContext(mlirContext ? mlirContext : new mlir::MLIRContext),
+      action(act) {}
+
+CIRGenAction::~CIRGenAction() { mlirModule.release(); }
+
+std::unique_ptr<ASTConsumer>
+CIRGenAction::CreateASTConsumer(CompilerInstance &ci, StringRef inputFile) {
+  auto out = ci.takeOutputStream();
+
+  auto Result = std::make_unique<cir::CIRGenConsumer>(
+      action, ci.getDiagnostics(), &ci.getVirtualFileSystem(),
+      ci.getHeaderSearchOpts(), ci.getCodeGenOpts(), ci.getTargetOpts(),
+      ci.getLangOpts(), ci.getFrontendOpts(), std::move(out));
+  cgConsumer = Result.get();
+
+  return std::move(Result);
+}
+
+void EmitCIRAction::anchor() {}
+EmitCIRAction::EmitCIRAction(mlir::MLIRContext *_MLIRContext)
+    : CIRGenAction(OutputType::EmitCIR, _MLIRContext) {}
diff --git a/clang/lib/CIR/FrontendAction/CMakeLists.txt b/clang/lib/CIR/FrontendAction/CMakeLists.txt
new file mode 100644
index 00000000000000..b0616ab5d64b09
--- /dev/null
+++ b/clang/lib/CIR/FrontendAction/CMakeLists.txt
@@ -0,0 +1,17 @@
+set(LLVM_LINK_COMPONENTS
+  Core
+  Support
+  )
+
+get_property(dialect_libs GLOBAL PROPERTY MLIR_DIALECT_LIBS)
+
+add_clang_library(clangCIRFrontendAction
+  CIRGenAction.cpp
+
+  LINK_LIBS
+  clangAST
+  clangFrontend
+  clangCIR
+  MLIRCIR
+  MLIRIR
+  )
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index f9ca76a5ac8007..b04752f23ed637 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -4944,6 +4944,9 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
     }
   }
 
+  if (Args.hasArg(options::OPT_fclangir))
+    CmdArgs.push_back("-fclangir");
+
   if (IsOpenMPDevice) {
     // We have to pass the triple of the host if compiling for an OpenMP device.
     std::string NormalizedTriple =
diff --git a/clang/lib/FrontendTool/CMakeLists.txt b/clang/lib/FrontendTool/CMakeLists.txt
index 51c379ade2704c..bfc7652b4c118f 100644
--- a/clang/lib/FrontendTool/CMakeLists.txt
+++ b/clang/lib/FrontendTool/CMakeLists.txt
@@ -12,6 +12,15 @@ set(link_libs
   clangRewriteFrontend
   )
 
+set(deps)
+
+if(CLANG_ENABLE_CIR)
+  list(APPEND link_libs
+    clangCIRFrontendAction
+    MLIRIR
+    )
+endif()
+
 if(CLANG_ENABLE_ARCMT)
   list(APPEND link_libs
     clangARCMigrate
@@ -29,7 +38,13 @@ add_clang_library(clangFrontendTool
 
   DEPENDS
   ClangDriverOptions
+  ${deps}
 
   LINK_LIBS
   ${link_libs}
   )
+
+if(CLANG_ENABLE_CIR)
+  target_include_directories(clangFrontendTool PRIVATE ${LLVM_MAIN_SRC_DIR}/../mlir/include)
+  target_include_directories(clangFrontendTool PRIVATE ${CMAKE_BINARY_DIR}/tools/mlir/include)
+endif()
diff --git a/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp b/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp
index 7476b1076d1038..563e8473f4fce6 100644
--- a/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp
+++ b/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp
@@ -31,6 +31,11 @@
 #include "llvm/Support/BuryPointer.h"
 #include "llvm/Support/DynamicLibrary.h"
 #include "llvm/Support/ErrorHandling.h"
+
+#if CLANG_ENABLE_CIR
+#include "clang/CIRFrontendAction/CIRGenAction.h"
+#endif
+
 using namespace clang;
 using namespace llvm::opt;
 
@@ -42,6 +47,14 @@ CreateFrontendBaseAction(CompilerInstance &CI) {
   StringRef Action("unknown");
   (void)Action;
 
+  auto UseCIR = CI.getFrontendOpts().UseClangIRPipeline;
+  auto Act = CI.getFrontendOpts().ProgramAction;
+  auto EmitsCIR = Act == EmitCIR;
+
+  if (!UseCIR && EmitsCIR)
+    llvm::report_fatal_error(
+        "-emit-cir and -emit-cir-only only valid when using -fclangir");
+
   switch (CI.getFrontendOpts().ProgramAction) {
   case ASTDeclList:            return std::make_unique<ASTDeclListAction>();
   case ASTDump:                return std::make_unique<ASTDumpAction>();
@@ -53,8 +66,13 @@ CreateFrontendBaseAction(CompilerInstance &CI) {
   case DumpTokens:             return std::make_unique<DumpTokensAction>();
   case EmitAssembly:           return std::make_unique<EmitAssemblyAction>();
   case EmitBC:                 return std::make_unique<EmitBCAction>();
+#if CLANG_ENABLE_CIR
+  case EmitCIR:
+    return std::make_unique<::cir::EmitCIRAction>();
+#else
   case EmitCIR:
     llvm_unreachable("CIR suppport not built into clang");
+#endif
   case EmitHTML:               return std::make_unique<HTMLPrintAction>();
   case EmitLLVM:               return std::make_unique<EmitLLVMAction>();
   case EmitLLVMOnly:           return std::make_unique<EmitLLVMOnlyAction>();
diff --git a/clang/test/CIR/hello.c b/clang/test/CIR/hello.c
new file mode 100644
index 00000000000000..37c6ac9e65b3b5
--- /dev/null
+++ b/clang/test/CIR/hello.c
@@ -0,0 +1,4 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fclangir -emit-cir %s
+
+// just confirm that we don't crash
+void foo() {}
diff --git a/clang/test/CIR/lit.local.cfg b/clang/test/CIR/lit.local.cfg
new file mode 100644
index 00000000000000..6e9e8b42c00885
--- /dev/null
+++ b/clang/test/CIR/lit.local.cfg
@@ -0,0 +1,2 @@
+if not config.root.clang_enable_cir:
+    clang.unsupported = True

Comment on lines 35 to 37
clang::DiagnosticsEngine &Diags;
clang::ASTContext *astCtx;
llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mixed naming conventions in the header should be fixed (preference is to follow LLVM style if we're changing code around, but if the local preference is for MLIR, that's fine so long as it's consistent).


bool CIRGenerator::HandleTopLevelDecl(DeclGroupRef D) {

for (DeclGroupRef::iterator I = D.begin(), E = D.end(); I != E; ++I) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm::for_each? range-based for loop?


virtual void anchor();

[[maybe_unused]] CIRGenAction::OutputType action;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is so much of the interface marked as maybe unused?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To silence warnings for things we are about to introduce in follow up commits. Sometimes we prefetch the skeleton in NFC fashion, in order to make functionality changes have a smaller / more relevant diff. But we could surely remove them too.

@@ -42,6 +47,14 @@ CreateFrontendBaseAction(CompilerInstance &CI) {
StringRef Action("unknown");
(void)Action;

auto UseCIR = CI.getFrontendOpts().UseClangIRPipeline;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the community conventions in this case because this is materially different from naming. With naming, it's about style and while it's quite distracting to have a unique style, it's doesn't really hurt understanding the code. With use of auto, it's about being able to understand the code for people not familiar with it; using auto in this way makes the code harder for the community to maintain in ways that naming conventions don't really have.

@bcardosolopes
Copy link
Member

bcardosolopes commented May 13, 2024

Thanks for everyone's input so far. Let me try to summarize discussions in this PR so we can set on an approach and give advice to our CIR community (and encode on our webpage) on how to move forward in upcoming patches. Useful resources:

CC'ing more people who might care: @seven-mile, @Lancern.

Use of auto

As @joker-eph mentioned, MLIR isn't meant to differ from LLVM/Clang, though they encode the differences as part of their guidelines. The auto guidance is still in favor of us keeping it for all isa, dyn_cast and cast, which is used in CIR, so it probably doesn't affect most of what we currently care about. Here's my suggestion for the entire lib/CIR:

  1. Use it for function templates such as isa, dyn_cast, cast, create and rewriteOp.* variants.
  2. Use it for things considered obvious/common in MLIR space, examples:
  • auto loc = op->getLoc().
  • Getting operands and results from operations (they obey Value Semantics), e.g.: DepthwiseConv2DNhwcHwcmOp op; ...; auto stride = op.getStrides();
  • Other examples we are happy to provide upon reviewer feedback if it makes sense.

Using the logic above, all autos in this current PR have to be changed (since none apply).

Var naming: CamelCase vs camelBack

From this discussion, seems like @AaronBallman and @erichkeane are fine with us using camelBack and all the other differences from MLIR Style Guide in CIRGen and the rest of CIR. Is that right? This is based on the comment:

The mixed naming conventions in the header should be fixed (preference is to follow LLVM style if we're changing code around, but if the local preference is for MLIR, that's fine so long as it's consistent).

However, @AaronBallman also wrote:

Also, the fact that the naming keeps being inconsistent is a pretty strong reason to change to use the LLVM naming convention, at least for external interfaces.

Should we ignore this in light of your first comment? If not, can you clarify what do you mean by external interfaces? Just want to make sure we get it right looking fwd.

Does this makes sense? Thoughts?

@Lancern
Copy link
Member

Lancern commented May 14, 2024

FWIW, the current practice uses CamelCase for CIRGen and camelBack for all other CIR stuff. Most code in CIRGen is directly ported from clang CodeGen and the code style is kept as-is, while other part of CIR is invented from scratch and we follow MLIR style guides. I'm not sure whether this is an acceptable scheme, but the naming style problems pointed out specifically in this PR (e.g. langOpts) should be resolved if this scheme is to be followed as they appear in CIRGen.

@AaronBallman
Copy link
Collaborator

Thanks for everyone's input so far. Let me try to summarize discussions in this PR so we can set on an approach and give advice to our CIR community (and encode on our webpage) on how to move forward in upcoming patches. Useful resources:

* [LLVM Coding Standard](https://llvm.org/docs/CodingStandards.html)

* [MLIR Style Guide](https://mlir.llvm.org/getting_started/DeveloperGuide/#style-guide)

CC'ing more people who might care: @seven-mile, @Lancern.

Use of auto

As @joker-eph mentioned, MLIR isn't meant to differ from LLVM/Clang, though they encode the differences as part of their guidelines. The auto guidance is still in favor of us keeping it for all isa, dyn_cast and cast, which is used in CIR, so it probably doesn't affect most of what we currently care about. Here's my suggestion for the entire lib/CIR:

1. Use it for function templates such as `isa`, `dyn_cast`, `cast`, `create` and `rewriteOp.*` variants.

Agreed -- anywhere the type is spelled out somewhere in the initializer is fine (so also static_cast, getAs, new, etc).

2. Use it for things considered obvious/common in MLIR space, examples:


* `auto loc = op->getLoc()`.

What is obvious in the MLIR space is not necessarily what's obvious in Clang; I have no idea whether that returns a SourceLocation, a PresumedLoc, a FullSourceLoc, etc, so I don't think that is a use of auto I would want to have to reason about as a reviewer. That said, if the only use of loc is: Diag(loc, diag::err_something); then the type really doesn't matter and use of auto is probably fine. It's a judgment call.

* Getting operands and results from operations (they obey Value Semantics), e.g.: `DepthwiseConv2DNhwcHwcmOp op; ...; auto stride = op.getStrides();`

Again, I have no idea what the type of that is and I have no way to verify that it actually uses value semantics because the pointer and qualifiers can be inferred and references can be dropped. That makes it harder to review the code; I would spell out the type in this case as well.

* Other examples we are happy to provide upon reviewer feedback if it makes sense.

The big things for reviewers are: don't use auto if the type isn't incredibly obvious (spelled in the initialization or extremely obvious based on immediately surrounding code) and don't make us infer important parts the declarator (if it's a const Foo * and auto makes sense for some reason, spell it const auto * rather than auto). And if a reviewer asks for something to be converted from auto to an explicit type name, assume that means the code isn't as readable as it could be and switch to a concrete type (we should not be arguing "I think this is clear therefore you should also think it's clear" during review, that just makes the review process painful for everyone involved).

Using the logic above, all autos in this current PR have to be changed (since none apply).

Var naming: CamelCase vs camelBack

From this discussion, seems like @AaronBallman and @erichkeane are fine with us using camelBack and all the other differences from MLIR Style Guide in CIRGen and the rest of CIR. Is that right? This is based on the comment:

The mixed naming conventions in the header should be fixed (preference is to follow LLVM style if we're changing code around, but if the local preference is for MLIR, that's fine so long as it's consistent).

However, @AaronBallman also wrote:

Also, the fact that the naming keeps being inconsistent is a pretty strong reason to change to use the LLVM naming convention, at least for external interfaces.

Should we ignore this in light of your first comment? If not, can you clarify what do you mean by external interfaces? Just want to make sure we get it right looking fwd.

My thinking is: 1) (most important) the convention should be consistent with other nearby code, whatever that convention happens to be. 2) if there's not already a convention to follow and if it's code that lives in clang/include/clang, it should generally follow the LLVM style (it's an "external" interface) because those are the APIs that folks outside of CIR will be calling into. If it's code that lives in clang/lib/, following the MLIR style is less of a concern.

So in the case of this review, there's mixed styles being used in the PR and so something has to change. If we're changing something anyway, changing to the LLVM style is generally preferred over changing to the MLIR style.

One caveat to this is: avoid busywork but use your best judgement on how we eventually get to a consistent use of the LLVM style. If changing to LLVM style means touching 1000s of lines of code while changing to MLIR style means touching 10s of lines of code, switch to MLIR style. But if changing to LLVM style means touching 100 lines of code and changing to MLIR style means touching 150 lines of code, it's probably best to just bite the bullet and switch to LLVM style even though MLIR would be less effort.

Does this all seem reasonable?

@bcardosolopes
Copy link
Member

What is obvious in the MLIR space is not necessarily what's obvious in Clang;

Sure.

I have no idea whether that returns a SourceLocation, a PresumedLoc, a FullSourceLoc, etc, so I don't think that is a use of auto I would want to have to reason about as a reviewer. That said, if the only use of loc is: Diag(loc, diag::err_something); then the type really doesn't matter and use of auto is probably fine. It's a judgment call.

For location, getLoc always return mlir::Location, it's not related to Clang's source location. CIRGen use of loc is mostly to pass it down to MLIR APIs, because MLIR source locations are required whenever one is building or rewriting an operation. We usually convert Clang's location into MLIR and pass those around as expected. Has that changed your opinion for this specific case?

Again, I have no idea what the type of that is and I have no way to verify that it actually uses value semantics because the pointer and qualifiers can be inferred and references can be dropped. That makes it harder to review the code; I would spell out the type in this case as well.

Makes sense. For some background: MLIR "instructions", usually called "operations", usually have pretty default return types when querying operands, mlir::Value for SSA values or when an mlir::Attribute is returned, the getter/setter is suffixed with Attr. Would it be reasonable to suggest a guideline where in CIRGen we get more strict on not using auto but for other passes and transformations it should be fine to use auto on those? My rationale here is that post CIRGen it's very likely the reviwer audience is going to be more among MLIR experts than Clang experts and the former are more familiar with the idiom.

The big things for reviewers are: don't use auto if the type isn't incredibly obvious (spelled in the initialization or extremely obvious based on immediately surrounding code) and don't make us infer important parts the declarator (if it's a const Foo * and auto makes sense for some reason, spell it const auto * rather than auto).

Agreed. Btw, I'm not advocating for plain auto here/everywhere. This is more about auto usage big picture.

And if a reviewer asks for something to be converted from auto to an explicit type name, assume that means the code isn't as readable as it could be and switch to a concrete type (we should not be arguing "I think this is clear therefore you should also think it's clear" during review, that just makes the review process painful for everyone involved).

Agreed. If it makes the reviewer job easier we should just abide.

My thinking is: 1) (most important) the convention should be consistent with other nearby code, whatever that convention happens to be.

Ok, for camelBack, it seems like clang/lib/CodeGen is already adopting it in multiple places (unsure if it's intentional or not?), and looks incomplete because probably no one took the effort to make it uniform? The signal this gives me is that we should just do it (use camelBack) for CIRGen.

  1. if there's not already a convention to follow and if it's code that lives in clang/include/clang, it should generally follow the LLVM style (it's an "external" interface) because those are the APIs that folks outside of CIR will be calling into. If it's code that lives in clang/lib/, following the MLIR style is less of a concern.

Great & agreed, thanks for the clarification.

... changing to the LLVM style is generally preferred over changing to the MLIR style. One caveat to this is: avoid busywork but use your best judgement on how we eventually get to a consistent use of the LLVM style. ... it's probably best to just bite the bullet and switch to LLVM style even though MLIR would be less effort. Does this all seem reasonable?

Yep, thank you @AaronBallman for putting the time here to help us set the right approach looking fwd. To be more specific, I'd suggest:

  • clang/include -> LLVM
  • lib/Driver/* -> LLVM
  • lib/FrontendTool -> LLVM
  • lib/CIR/FrontendAction -> LLVM
  • lib/CIR/CodeGen -> MLIR
  • lib/CIR/* -> MLIR

What do you think?

@joker-eph
Copy link
Collaborator

joker-eph commented May 20, 2024

What part of the "camelBack" is visible to the user of the API? (we're not talking about method names or class names here I believe)

(and in general the method prototype in the header should likely use the same parameter names as the implementation file)

@bcardosolopes
Copy link
Member

I encoded the discussion in our clangir.org page, with some speculation on Aaron's pending replies, happy to change and fix anything. In the future we'll upstream that too, but for the time being it's here: https://llvm.github.io/clangir/GettingStarted/coding-guideline.html

Copy link
Collaborator

What part of the "camelBack" is visible to the user of the API? (we're not talking about method names or class names here I believe)

Parameter names, which show up in tooltips in IDEs, etc.

But again, the important thing is to at least be self-consistent. e.g.,

  CIRGenModule(mlir::MLIRContext &context, clang::ASTContext &astctx,
               const clang::CodeGenOptions &CGO,
               clang::DiagnosticsEngine &Diags);

should be:

  CIRGenModule(mlir::MLIRContext &Context, clang::ASTContext &AstCtx,
               const clang::CodeGenOptions &CGO,
               clang::DiagnosticsEngine &Diags);

or

  CIRGenModule(mlir::MLIRContext &context, clang::ASTContext &astctx,
               const clang::CodeGenOptions &cgo
               clang::DiagnosticsEngine &diags);

rather than in a mixed form as it currently is.

Copy link
Collaborator

I have no idea whether that returns a SourceLocation, a PresumedLoc, a FullSourceLoc, etc, so I don't think that is a use of auto I would want to have to reason about as a reviewer. That said, if the only use of loc is: Diag(loc, diag::err_something); then the type really doesn't matter and use of auto is probably fine. It's a judgment call.

For location, getLoc always return mlir::Location, it's not related to Clang's source location. CIRGen use of loc is mostly to pass it down to MLIR APIs, because MLIR source locations are required whenever one is building or rewriting an operation. We usually convert Clang's location into MLIR and pass those around as expected. Has that changed your opinion for this specific case?

My preference is still to spell it out; I would have had no idea that there was an mlir::Location type and presumed this was using Clang's source location types.

I can see the argument either way, but I prefer we optimize for readbility of the code by people not ensconced in MLIR. Otherwise this becomes harder for anyone other than MLIR folks to get involved with. That said, if it's inside of the lib directory and isn't part of a public interface, it's probably not the end of the world to stick with auto. I just don't think it benefits anyone to do so other than to make it easier to land the initial patches, which doesn't seem like a compelling reason to stick with auto.

Again, I have no idea what the type of that is and I have no way to verify that it actually uses value semantics because the pointer and qualifiers can be inferred and references can be dropped. That makes it harder to review the code; I would spell out the type in this case as well.

Makes sense. For some background: MLIR "instructions", usually called "operations", usually have pretty default return types when querying operands, mlir::Value for SSA values or when an mlir::Attribute is returned, the getter/setter is suffixed with Attr. Would it be reasonable to suggest a guideline where in CIRGen we get more strict on not using auto but for other passes and transformations it should be fine to use auto on those? My rationale here is that post CIRGen it's very likely the reviwer audience is going to be more among MLIR experts than Clang experts and the former are more familiar with the idiom.

I think that's something we can live with, but like above, this makes it harder for the community in general while making it easier for folks already involved with MLIR.

My thinking is: 1) (most important) the convention should be consistent with other nearby code, whatever that convention happens to be.

Ok, for camelBack, it seems like clang/lib/CodeGen is already adopting it in multiple places (unsure if it's intentional or not?), and looks incomplete because probably no one took the effort to make it uniform? The signal this gives me is that we should just do it (use camelBack) for CIRGen.

That seems reasonable to me.

Yep, thank you @AaronBallman for putting the time here to help us set the right approach looking fwd. To be more specific, I'd suggest:

  • clang/include -> LLVM
  • lib/Driver/* -> LLVM
  • lib/FrontendTool -> LLVM
  • lib/CIR/FrontendAction -> LLVM
  • lib/CIR/CodeGen -> MLIR
  • lib/CIR/* -> MLIR

What do you think?

I think that's a reasonable compromise. @erichkeane WDYT?

@bcardosolopes
Copy link
Member

Thanks @AaronBallman

But again, the important thing is to at least be self-consistent. e.g.,

It's a good point to reinforce, updated.

I just don't think it benefits anyone to do so other than to make it easier to land the initial patches, which doesn't seem like a compelling reason to stick with auto.

Fair. I moved the usage of auto for mlir::Location for things outside "CIRGen/FrontendAction" but allowed as part of lib/CIR/{Transforms,*}, where the context makes it way more easier to understand (e.g. it can only be one thing, no Clang's source locations expected at that point)

I think that's something we can live with, but like above, this makes it harder for the community in general while making it easier for folks already involved with MLIR.

Ok, I'll take the "we can live with" for this one.

Doc updated accordingly: https://llvm.github.io/clangir/GettingStarted/coding-guideline.html

Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, it will be helpful to create subscribing team to help people to get informed in time. (I just saw the patch accidently.)

Created using spr 1.3.5
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Jul 10, 2024
lanza added 4 commits July 10, 2024 23:15
Created using spr 1.3.5
Created using spr 1.3.5
Created using spr 1.3.5
Created using spr 1.3.5
Created using spr 1.3.5
Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like all issues have been addressed? Anything remaining @AaronBallman ?

Thanks everyone for all the reviews!

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concerns have been addressed, but there's still a small open question on the driver flag behavior though, so please give @MaskRay a chance to respond before landing.

Otherwise, LGTM!

@@ -2900,7 +2900,7 @@ defm clangir : BoolFOption<"clangir",
PosFlag<SetTrue, [], [ClangOption, CC1Option], "Use the ClangIR pipeline to compile">,
NegFlag<SetFalse, [], [ClangOption, CC1Option], "Use the AST -> LLVM pipeline to compile">,
BothFlags<[], [ClangOption, CC1Option], "">>;
def emit_cir : Flag<["-"], "emit-cir">, Visibility<[CC1Option]>,
def emit_cir : Flag<["-"], "emit-cir">, Visibility<[ClangOption, CC1Option]>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaskRay are you okay with this approach? Also, if this is exposed as a driver flag, should we have a test in clang/test/Driver that it emits the expected -cc1 option?

@AaronBallman
Copy link
Collaborator

My concerns have been addressed, but there's still a small open question on the driver flag behavior though, so please give @MaskRay a chance to respond before landing.

Otherwise, LGTM!

CC @MaskRay now that you may be back from vacation

Created using spr 1.3.5
@lanza
Copy link
Member Author

lanza commented Oct 9, 2024

@AaronBallman any gripes with moving forward and landing this after a month?

@AaronBallman
Copy link
Collaborator

@AaronBallman any gripes with moving forward and landing this after a month?

Nope, I think it's fine to move ahead with this. If @MaskRay has concerns, they can be addressed post commit.

Created using spr 1.3.5
@lanza lanza merged commit 1bb52e9 into main Oct 9, 2024
6 of 7 checks passed
@lanza lanza deleted the users/lanza/sprcir-build-out-ast-consumer-patterns-to-reach-the-entry-point-into-cirgen branch October 9, 2024 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category ClangIR Anything related to the ClangIR project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants