Skip to content

Commit 98f098c

Browse files
committed
[Very WIP] Rewrite the core of the binding generator.
TL;DR: The binding generator is a mess as of right now. At first it was funny (in a "this is challenging" sense) to improve on it, but this is not sustainable. The truth is that the current architecture of the binding generator is a huge pile of hacks, so these few days I've been working on rewriting it with a few goals. 1) Have the hacks as contained and identified as possible. They're sometimes needed because how clang exposes the AST, but ideally those hacks are well identified and don't interact randomly with each others. As an example, in the current bindgen when scanning the parameters of a function that references a struct clones all the struct information, then if the struct name changes (because we mangle it), everything breaks. 2) Support extending the bindgen output without having to deal with clang. The way I'm aiming to do this is separating completely the parsing stage from the code generation one, and providing a single id for each item the binding generator provides. 3) No more random mutation of the internal representation from anywhere. That means no more Rc<RefCell<T>>, no more random circular references, no more borrow_state... nothing. 4) No more deduplication of declarations before code generation. Current bindgen has a stage, called `tag_dup_decl`[1], that takes care of deduplicating declarations. That's completely buggy, and for C++ it's a complete mess, since we YOLO modify the world. I've managed to take rid of this using the clang canonical declaration, and the definition, to avoid scanning any type/item twice. 5) Code generation should not modify any internal data structure. It can lookup things, traverse whatever it needs, but not modifying randomly. 6) Each item should have a canonical name, and a single source of mangling logic, and that should be computed from the inmutable state, at code generation. I've put a few canonical_name stuff in the code generation phase, but it's still not complete, and should change if I implement namespaces. Improvements pending until this can land: 1) Add support for missing core stuff, mainly generating functions (note that we parse the signatures for types correctly though), bitfields, generating C++ methods. 2) Add support for the necessary features that were added to work around some C++ pitfalls, like opaque types, etc... 3) Add support for the sugar that Manish added recently. 4) Optionally (and I guess this can land without it, because basically nobody uses it since it's so buggy), bring back namespace support. These are not completely trivial, but I think I can do them quite easily with the current architecture. I'm putting the current state of affairs here as a request for comments... Any thoughts? Note that there are still a few smells I want to eventually re-redesign, like the ParseError::Recurse thing, but until that happens I'm way happier with this kind of architecture. I'm keeping the old `parser.rs` and `gen.rs` in tree just for reference while I code, but they will go away. [1]: https://github.com/Yamakaky/rust-bindgen/blob/master/src/gen.rs#L448 Checkpoint to show a rust ICE. Union fields. Parse annotations. Multiple cleanups.
1 parent 2d94347 commit 98f098c

24 files changed

+3767
-1041
lines changed

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ quasi = { version = "0.15", features = ["with-syntex"] }
1616
clippy = { version = "*", optional = true }
1717
syntex_syntax = "0.38"
1818
log = "0.3.*"
19+
env_logger = "*"
1920
libc = "0.2.*"
2021
clang-sys = "0.8.0"
2122

build.rs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,11 @@ mod codegen {
55

66
pub fn main() {
77
let out_dir = env::var_os("OUT_DIR").unwrap();
8-
let src = Path::new("src/gen.rs");
9-
let dst = Path::new(&out_dir).join("gen.rs");
8+
let src = Path::new("src/codegen/mod.rs");
9+
let dst = Path::new(&out_dir).join("codegen.rs");
1010

1111
quasi_codegen::expand(&src, &dst).unwrap();
12-
println!("cargo:rerun-if-changed=src/gen.rs");
12+
println!("cargo:rerun-if-changed=src/codegen/mod.rs");
1313
}
1414
}
1515

src/bin/bindgen.rs

Lines changed: 13 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -2,29 +2,17 @@
22
#![crate_type = "bin"]
33

44
extern crate bindgen;
5+
extern crate env_logger;
56
#[macro_use]
67
extern crate log;
78
extern crate clang_sys;
89

9-
use bindgen::{Bindings, BindgenOptions, LinkType, Logger};
10+
use bindgen::{Bindings, BindgenOptions, LinkType};
1011
use std::io;
1112
use std::path;
1213
use std::env;
1314
use std::default::Default;
1415
use std::fs;
15-
use std::process::exit;
16-
17-
struct StdLogger;
18-
19-
impl Logger for StdLogger {
20-
fn error(&self, msg: &str) {
21-
println!("{}", msg);
22-
}
23-
24-
fn warn(&self, msg: &str) {
25-
println!("{}", msg);
26-
}
27-
}
2816

2917
enum ParseResult {
3018
CmdUsage,
@@ -230,6 +218,13 @@ Options:
230218
}
231219

232220
pub fn main() {
221+
log::set_logger(|max_log_level| {
222+
use env_logger::Logger;
223+
let env_logger = Logger::new();
224+
max_log_level.set(env_logger.filter());
225+
Box::new(env_logger)
226+
}).expect("Failed to set logger.");
227+
233228
let mut bind_args: Vec<_> = env::args().collect();
234229
let bin = bind_args.remove(0);
235230

@@ -247,17 +242,10 @@ pub fn main() {
247242
ParseResult::ParseErr(e) => panic!(e),
248243
ParseResult::CmdUsage => print_usage(bin),
249244
ParseResult::ParseOk(options, out) => {
250-
let logger = StdLogger;
251-
match Bindings::generate(&options, Some(&logger as &Logger), None) {
252-
Ok(bindings) => match bindings.write(out) {
253-
Ok(()) => (),
254-
Err(e) => {
255-
logger.error(&format!("Unable to write bindings to file. {}", e));
256-
exit(-1);
257-
}
258-
},
259-
Err(()) => exit(-1)
260-
}
245+
let bindings = Bindings::generate(&options, None)
246+
.expect("Failed to generate bindings!");
247+
248+
bindings.write(out).expect("Unable to write bindings!");
261249
}
262250
}
263251
}

src/clang.rs

Lines changed: 126 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,13 @@ pub struct Cursor {
1515
x: CXCursor
1616
}
1717

18+
impl fmt::Debug for Cursor {
19+
fn fmt(&self, fmt: &mut fmt::Formatter) -> fmt::Result {
20+
write!(fmt, "Cursor({} kind: {}, loc: {})",
21+
self.spelling(), kind_to_str(self.kind()), self.location())
22+
}
23+
}
24+
1825
pub type CursorVisitor<'s> = for<'a, 'b> FnMut(&'a Cursor, &'b Cursor) -> Enum_CXChildVisitResult + 's;
1926

2027
impl Cursor {
@@ -49,12 +56,57 @@ impl Cursor {
4956
}
5057
}
5158

59+
pub fn num_template_args(&self) -> c_int {
60+
unsafe {
61+
clang_Cursor_getNumTemplateArguments(self.x)
62+
}
63+
}
64+
65+
66+
/// This function gets the translation unit cursor. Note that we shouldn't
67+
/// create a TranslationUnit struct here, because bindgen assumes there will
68+
/// only be one of them alive at a time, and dispose it on drop. That can
69+
/// change if this would be required, but I think we can survive fine
70+
/// without it.
71+
pub fn translation_unit(&self) -> Cursor {
72+
assert!(self.is_valid());
73+
unsafe {
74+
let tu = clang_Cursor_getTranslationUnit(self.x);
75+
let cursor = Cursor {
76+
x: clang_getTranslationUnitCursor(tu),
77+
};
78+
assert!(cursor.is_valid());
79+
cursor
80+
}
81+
}
82+
83+
pub fn is_toplevel(&self) -> bool {
84+
let mut semantic_parent = self.semantic_parent();
85+
86+
while semantic_parent.kind() == CXCursor_Namespace ||
87+
semantic_parent.kind() == CXCursor_NamespaceAlias ||
88+
semantic_parent.kind() == CXCursor_NamespaceRef
89+
{
90+
semantic_parent = semantic_parent.semantic_parent();
91+
}
92+
93+
let tu = self.translation_unit();
94+
// Yes, the second can happen with, e.g., macro definitions.
95+
semantic_parent == tu || semantic_parent == tu.semantic_parent()
96+
}
97+
5298
pub fn kind(&self) -> Enum_CXCursorKind {
5399
unsafe {
54100
clang_getCursorKind(self.x)
55101
}
56102
}
57103

104+
pub fn is_anonymous(&self) -> bool {
105+
unsafe {
106+
clang_Cursor_isAnonymous(self.x) != 0
107+
}
108+
}
109+
58110
pub fn is_template(&self) -> bool {
59111
self.specialized().is_valid()
60112
}
@@ -77,10 +129,11 @@ impl Cursor {
77129
}
78130
}
79131

80-
pub fn raw_comment(&self) -> String {
81-
unsafe {
132+
pub fn raw_comment(&self) -> Option<String> {
133+
let s = unsafe {
82134
String_ { x: clang_Cursor_getRawCommentText(self.x) }.to_string()
83-
}
135+
};
136+
if s.is_empty() { None } else { Some(s) }
84137
}
85138

86139
pub fn comment(&self) -> Comment {
@@ -165,12 +218,18 @@ impl Cursor {
165218
}
166219
}
167220

168-
pub fn enum_val(&self) -> i64 {
221+
pub fn enum_val_signed(&self) -> i64 {
169222
unsafe {
170223
clang_getEnumConstantDeclValue(self.x) as i64
171224
}
172225
}
173226

227+
pub fn enum_val_unsigned(&self) -> u64 {
228+
unsafe {
229+
clang_getEnumConstantDeclUnsignedValue(self.x) as u64
230+
}
231+
}
232+
174233
// typedef
175234
pub fn typedef_type(&self) -> Type {
176235
unsafe {
@@ -274,29 +333,40 @@ impl PartialEq for Cursor {
274333
clang_equalCursors(self.x, other.x) == 1
275334
}
276335
}
277-
278-
fn ne(&self, other: &Cursor) -> bool {
279-
!self.eq(other)
280-
}
281336
}
282337

283338
impl Eq for Cursor {}
284339

285340
impl Hash for Cursor {
286341
fn hash<H: Hasher>(&self, state: &mut H) {
287-
self.x.kind.hash(state);
288-
self.x.xdata.hash(state);
289-
self.x.data[0].hash(state);
290-
self.x.data[1].hash(state);
291-
self.x.data[2].hash(state);
342+
unsafe { clang_hashCursor(self.x) }.hash(state)
292343
}
293344
}
294345

295346
// type
347+
#[derive(Clone, Hash)]
296348
pub struct Type {
297349
x: CXType
298350
}
299351

352+
impl PartialEq for Type {
353+
fn eq(&self, other: &Self) -> bool {
354+
unsafe {
355+
clang_equalTypes(self.x, other.x) != 0
356+
}
357+
}
358+
}
359+
360+
impl Eq for Type {}
361+
362+
impl fmt::Debug for Type {
363+
fn fmt(&self, fmt: &mut fmt::Formatter) -> fmt::Result {
364+
write!(fmt, "Type({}, kind: {}, decl: {:?}, canon: {:?})",
365+
self.spelling(), type_to_str(self.kind()), self.declaration(),
366+
self.declaration().canonical())
367+
}
368+
}
369+
300370
#[derive(Debug, Copy, Clone, Eq, PartialEq, Hash)]
301371
pub enum LayoutError {
302372
Invalid,
@@ -378,6 +448,24 @@ impl Type {
378448
}
379449
}
380450

451+
pub fn fallible_align(&self) -> Result<usize, LayoutError> {
452+
unsafe {
453+
let val = clang_Type_getAlignOf(self.x);
454+
if val < 0 {
455+
Err(LayoutError::from(val as i32))
456+
} else {
457+
Ok(val as usize)
458+
}
459+
}
460+
}
461+
462+
pub fn fallible_layout(&self) -> Result<::ir::layout::Layout, LayoutError> {
463+
use ir::layout::Layout;
464+
let size = try!(self.fallible_size());
465+
let align = try!(self.fallible_align());
466+
Ok(Layout::new(size, align))
467+
}
468+
381469
pub fn align(&self) -> usize {
382470
unsafe {
383471
let val = clang_Type_getAlignOf(self.x);
@@ -581,21 +669,25 @@ pub struct Index {
581669
}
582670

583671
impl Index {
584-
pub fn create(pch: bool, diag: bool) -> Index {
672+
pub fn new(pch: bool, diag: bool) -> Index {
585673
unsafe {
586674
Index { x: clang_createIndex(pch as c_int, diag as c_int) }
587675
}
588676
}
677+
}
589678

590-
pub fn dispose(&self) {
679+
impl fmt::Debug for Index {
680+
fn fmt(&self, fmt: &mut fmt::Formatter) -> fmt::Result {
681+
write!(fmt, "Index {{ }}")
682+
}
683+
}
684+
685+
impl Drop for Index {
686+
fn drop(&mut self) {
591687
unsafe {
592688
clang_disposeIndex(self.x);
593689
}
594690
}
595-
596-
pub fn is_null(&self) -> bool {
597-
self.x.is_null()
598-
}
599691
}
600692

601693
// Token
@@ -609,6 +701,12 @@ pub struct TranslationUnit {
609701
x: CXTranslationUnit
610702
}
611703

704+
impl fmt::Debug for TranslationUnit {
705+
fn fmt(&self, fmt: &mut fmt::Formatter) -> fmt::Result {
706+
write!(fmt, "TranslationUnit {{ }}")
707+
}
708+
}
709+
612710
impl TranslationUnit {
613711
pub fn parse(ix: &Index, file: &str, cmd_args: &[String],
614712
unsaved: &[UnsavedFile], opts: ::libc::c_uint) -> TranslationUnit {
@@ -655,12 +753,6 @@ impl TranslationUnit {
655753
}
656754
}
657755

658-
pub fn dispose(&self) {
659-
unsafe {
660-
clang_disposeTranslationUnit(self.x);
661-
}
662-
}
663-
664756
pub fn is_null(&self) -> bool {
665757
self.x.is_null()
666758
}
@@ -687,6 +779,15 @@ impl TranslationUnit {
687779
}
688780
}
689781

782+
impl Drop for TranslationUnit {
783+
fn drop(&mut self) {
784+
unsafe {
785+
clang_disposeTranslationUnit(self.x);
786+
}
787+
}
788+
}
789+
790+
690791
// Diagnostic
691792
pub struct Diagnostic {
692793
x: CXDiagnostic

src/clangll.rs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -428,7 +428,7 @@ pub const CXCallingConv_X86_64SysV: c_uint = 11;
428428
pub const CXCallingConv_Invalid: c_uint = 100;
429429
pub const CXCallingConv_Unexposed: c_uint = 200;
430430
#[repr(C)]
431-
#[derive(Copy, Clone)]
431+
#[derive(Copy, Clone, Hash)]
432432
pub struct CXType {
433433
pub kind: Enum_CXTypeKind,
434434
pub data: [*mut c_void; 2],
@@ -1076,6 +1076,7 @@ extern "C" {
10761076
pub fn clang_Cursor_getNumArguments(C: CXCursor) -> c_int;
10771077
pub fn clang_Cursor_getArgument(C: CXCursor, i: c_uint) ->
10781078
CXCursor;
1079+
pub fn clang_Cursor_getNumTemplateArguments(T: CXCursor) -> c_int;
10791080
pub fn clang_Cursor_getTemplateArgumentKind(C: CXCursor, i: c_uint) ->
10801081
CXTemplateArgumentKind;
10811082
pub fn clang_Cursor_getTemplateArgumentValue(C: CXCursor, i: c_uint) ->
@@ -1168,6 +1169,7 @@ extern "C" {
11681169
pub fn clang_Cursor_getMangling(C: CXCursor) -> CXString;
11691170
pub fn clang_Cursor_getParsedComment(C: CXCursor) -> CXComment;
11701171
pub fn clang_Cursor_getModule(C: CXCursor) -> CXModule;
1172+
pub fn clang_Cursor_isAnonymous(C: CXCursor) -> c_uint;
11711173
pub fn clang_Module_getASTFile(Module: CXModule) -> CXFile;
11721174
pub fn clang_Module_getParent(Module: CXModule) -> CXModule;
11731175
pub fn clang_Module_getName(Module: CXModule) -> CXString;

0 commit comments

Comments
 (0)