Skip to content

Commit 6c831d6

Browse files
committed
Add lots of documentation
1 parent fcca5cc commit 6c831d6

File tree

4 files changed

+241
-11
lines changed

4 files changed

+241
-11
lines changed

docs/ARCHITECTURE.md

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# Architecture of Crates.io
2+
3+
This document is a tour of the codebase in this repo. If you want to work on a bug or a feature,
4+
hopefully after reading this doc, you'll have a good idea of where to start looking for the code
5+
you want to change.
6+
7+
This is a work in progress. Pull requests and issues to improve this document are very welcome!
8+
9+
## Backend
10+
11+
The backend of crates.io is written in Rust. Most of that code lives in the *src* directory. It
12+
serves a JSON API over HTTP, and the HTTP server interface is provided by the [conduit][] crate and
13+
related crates.
14+
15+
[conduit]: https://crates.io/crates/conduit
16+
17+
### Server
18+
19+
The code to actually run the server is in *src/bin/server.rs*. This is where most of the pieces of
20+
the system are instantiated and configured, and can be thought of as the "entry point" to crates.io.
21+
22+
The server does the following things:
23+
24+
1. Initialize logging
25+
2. Check out the index git repository, if it isn't already checked out
26+
3. Reads values from environment variables to configure a new instance of `cargo_registry::App`
27+
4. Adds middleware to the app by calling `cargo_registry::middleware`
28+
5. Syncs the categories defined in *src/categories.toml* with the categories in the database
29+
6. Starts a [civet][] `Server` that uses the `cargo_registry::App` instance
30+
7. Tells Nginx on Heroku that the application is ready to receive requests, if running on Heroku
31+
8. Blocks forever (or until the process is killed) waiting to receive messages on a channel that no
32+
messages are ever sent to, in order to outive the civet `Server` threads
33+
34+
[civet]: https://crates.io/crates/civet
35+
36+
### Routes
37+
38+
The API URLs that the server responds to (aka "routes") are defined in
39+
*src/lib.rs*.
40+
41+
All of the `api_router` routes are mounted under the `/api/v1` path (see the
42+
lines that look like `router.get("/api/v1/*path", R(api_router.clone()));`).
43+
44+
Each API route definition looks like this:
45+
46+
```rust
47+
api_router.get("/crates", C(krate::index));
48+
```
49+
50+
This line defines a route that responds to a GET request made to
51+
`/api/v1/crates` with the results of calling the `krate::index` function. `C`
52+
is a struct that holds a function and implements the [`conduit::Handler`][]
53+
trait so that the results of the function are the response if the function
54+
succeeds, and that the server returns an error response if the function doesn't
55+
succeed. The `C` struct's purpose is to reduce some boilerplate.
56+
57+
[`conduit::Handler`]: https://docs.rs/conduit/0.8.1/conduit/trait.Handler.html
58+
59+
### Code having to do with running a web application
60+
61+
These modules could *maybe* be refactored into another crate. Maybe not. But their primary purpose
62+
is supporting the running of crates.io's web application parts, and they don't have much to do with
63+
the crate registry purpose of the application.
64+
65+
#### The `app` module
66+
67+
This contains the `App` struct, which holds a `Config` instance plus a few more application
68+
components such as:
69+
70+
- The database connection pools (there are two until we finish migrating the app to use Diesel
71+
everywhere)
72+
- The GitHub OAuth configuration
73+
- The cookie session key given to [conduit-cookie][]
74+
- The `git2::Repository` instance for the index repo checkout
75+
- The `Config` instance
76+
77+
This module also contains `AppMiddleware`, which implements the `Middleware` trait in order to
78+
inject the `app` instance into every request. That way, we can call `req.app()` to get to any of
79+
these components.
80+
81+
[conduit-cookie]: https://crates.io/crates/conduit-cookie
82+
83+
#### The `config` module
84+
85+
#### The `db` module
86+
87+
#### The `dist` module
88+
89+
#### The `http` module
90+
91+
#### The `model` module
92+
93+
#### The `schema` module
94+
95+
#### The `utils` module
96+
97+
### Code having to do with managing a registry of crates
98+
99+
These modules are specific to the domain of being a crate registry. These concepts would exist no
100+
matter what language or framework crates.io was implemented in.
101+
102+
#### The `krate` module
103+
104+
#### The `users` module
105+
106+
#### The `badge` module
107+
108+
#### The `categories` module
109+
110+
#### The `category` module
111+
112+
#### The `dependency` module
113+
114+
#### The `download` module
115+
116+
#### The `git` module
117+
118+
#### The `keyword` module
119+
120+
#### The `owner` module
121+
122+
#### The `upload` module
123+
124+
#### The `uploaders` module
125+
126+
#### The `version` module
127+
128+
### Database
129+
130+
### Tests
131+
132+
### Deploying

src/app.rs

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
//! Application-wide components in a struct accessible from each request
2+
13
use std::env;
24
use std::error::Error;
35
use std::path::PathBuf;
@@ -18,14 +20,19 @@ pub struct App {
1820
/// The database connection pool
1921
pub database: db::Pool,
2022

21-
/// The database connection pool
23+
/// The diesel database connection pool
2224
pub diesel_database: db::DieselPool,
2325

2426
/// The GitHub OAuth2 configuration
2527
pub github: oauth2::Config,
2628

29+
/// A unique key used with conduit_cookie to generate cookies
2730
pub session_key: String,
31+
32+
/// The crate index git repository
2833
pub git_repo: Mutex<git2::Repository>,
34+
35+
/// The location on disk of the checkout of the crate index git repository
2936
pub git_repo_checkout: PathBuf,
3037

3138
/// The server configuration
@@ -38,14 +45,20 @@ pub struct AppMiddleware {
3845
}
3946

4047
impl App {
48+
/// Creates a new `App` with a given `Config`
49+
///
50+
/// Configures and sets up:
51+
///
52+
/// - GitHub OAuth
53+
/// - Database connection pools
54+
/// - A `git2::Repository` instance from the index repo checkout (that server.rs ensures exists)
4155
pub fn new(config: &Config) -> App {
4256
let mut github = oauth2::Config::new(
4357
&config.gh_client_id,
4458
&config.gh_client_secret,
4559
"https://github.com/login/oauth/authorize",
4660
"https://github.com/login/oauth/access_token",
4761
);
48-
4962
github.scopes.push(String::from("read:org"));
5063

5164
let db_pool_size = match (env::var("DB_POOL_SIZE"), config.env) {
@@ -66,6 +79,7 @@ impl App {
6679
_ => 1,
6780
};
6881

82+
// We need two connection pools until we finish transitioning everything to use diesel.
6983
let db_config = r2d2::Config::builder()
7084
.pool_size(db_pool_size)
7185
.min_idle(db_min_idle)
@@ -78,6 +92,7 @@ impl App {
7892
.build();
7993

8094
let repo = git2::Repository::open(&config.git_repo_checkout).unwrap();
95+
8196
App {
8297
database: db::pool(&config.db_url, db_config),
8398
diesel_database: db::diesel_pool(&config.db_url, diesel_db_config),
@@ -89,6 +104,11 @@ impl App {
89104
}
90105
}
91106

107+
/// Returns a handle for making HTTP requests to upload crate files.
108+
///
109+
/// The handle will go through a proxy if the uploader being used has specified one, which
110+
/// is only done in test mode in order to be able to record and inspect the HTTP requests
111+
/// that tests make.
92112
pub fn handle(&self) -> Easy {
93113
let mut handle = Easy::new();
94114
if let Some(proxy) = self.config.uploader.proxy() {

src/bin/server.rs

Lines changed: 36 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,15 @@ use std::sync::mpsc::channel;
1717

1818
#[allow(dead_code)]
1919
fn main() {
20+
// Initialize logging
2021
env_logger::init().unwrap();
22+
23+
// If there isn't a git checkout containing the crate index repo at the path specified
24+
// by `GIT_REPO_CHECKOUT`, delete that directory and clone the repo specified by `GIT_REPO_URL`
25+
// into that directory instead. Uses the credentials specified in `GIT_HTTP_USER` and
26+
// `GIT_HTTP_PWD` via the `cargo_registry::git::credentials` function.
2127
let url = env("GIT_REPO_URL");
2228
let checkout = PathBuf::from(env("GIT_REPO_CHECKOUT"));
23-
2429
let repo = match git2::Repository::open(&checkout) {
2530
Ok(r) => r,
2631
Err(..) => {
@@ -36,11 +41,15 @@ fn main() {
3641
.unwrap()
3742
}
3843
};
44+
45+
// All commits to the index registry made through crates.io will be made by bors, the Rust
46+
// community's friendly GitHub bot.
3947
let mut cfg = repo.config().unwrap();
4048
cfg.set_str("user.name", "bors").unwrap();
4149
cfg.set_str("user.email", "[email protected]").unwrap();
4250

4351
let api_protocol = String::from("https");
52+
4453
let mirror = if env::var("MIRROR").is_ok() {
4554
Replica::ReadOnlyMirror
4655
} else {
@@ -56,7 +65,9 @@ fn main() {
5665

5766
let uploader = match (cargo_env, mirror) {
5867
(Env::Production, Replica::Primary) => {
59-
// `env` panics if these vars are not set
68+
// `env` panics if these vars are not set, and in production for a primary instance,
69+
// that's what we want since we don't want to be able to start the server if the server
70+
// doesn't know where to upload crates.
6071
Uploader::S3 {
6172
bucket: s3::Bucket::new(
6273
env("S3_BUCKET"),
@@ -69,8 +80,14 @@ fn main() {
6980
}
7081
}
7182
(Env::Production, Replica::ReadOnlyMirror) => {
72-
// Read-only mirrors don't need access key or secret key,
73-
// but they might have them. Definitely need bucket though.
83+
// Read-only mirrors don't need access key or secret key since by definition,
84+
// they'll only need to read from a bucket, not upload.
85+
//
86+
// Read-only mirrors might have access key or secret key, so use them if those
87+
// environment variables are set.
88+
//
89+
// Read-only mirrors definitely need bucket though, so that they know where
90+
// to serve crate files from.
7491
Uploader::S3 {
7592
bucket: s3::Bucket::new(
7693
env("S3_BUCKET"),
@@ -82,8 +99,13 @@ fn main() {
8299
proxy: None,
83100
}
84101
}
102+
// In Development mode, either running as a primary instance or a read-only mirror
85103
_ => {
86104
if env::var("S3_BUCKET").is_ok() {
105+
// If we've set the `S3_BUCKET` variable to any value, use all of the values
106+
// for the related S3 environment variables and configure the app to upload to
107+
// and read from S3 like production does. All values except for bucket are
108+
// optional, like production read-only mirrors.
87109
println!("Using S3 uploader");
88110
Uploader::S3 {
89111
bucket: s3::Bucket::new(
@@ -96,6 +118,9 @@ fn main() {
96118
proxy: None,
97119
}
98120
} else {
121+
// If we don't set the `S3_BUCKET` variable, we'll use a development-only
122+
// uploader that makes it possible to run and publish to a locally-running
123+
// crates.io instance without needing to set up an account and a bucket in S3.
99124
println!("Using local uploader, crate files will be in the dist directory");
100125
Uploader::Local
101126
}
@@ -110,13 +135,15 @@ fn main() {
110135
gh_client_secret: env("GH_CLIENT_SECRET"),
111136
db_url: env("DATABASE_URL"),
112137
env: cargo_env,
113-
max_upload_size: 10 * 1024 * 1024,
138+
max_upload_size: 10 * 1024 * 1024, // 10 MB default file upload size limit
114139
mirror: mirror,
115140
api_protocol: api_protocol,
116141
};
117142
let app = cargo_registry::App::new(&config);
118143
let app = cargo_registry::middleware(Arc::new(app));
119144

145+
// On every server restart, ensure the categories available in the database match
146+
// the information in *src/categories.toml*.
120147
cargo_registry::categories::sync().unwrap();
121148

122149
let port = if heroku {
@@ -131,7 +158,11 @@ fn main() {
131158
let mut cfg = civet::Config::new();
132159
cfg.port(port).threads(threads).keep_alive(true);
133160
let _a = Server::start(cfg, app);
161+
134162
println!("listening on port {}", port);
163+
164+
// Creating this file tells heroku to tell nginx that the application is ready
165+
// to receive traffic.
135166
if heroku {
136167
File::create("/tmp/app-initialized").unwrap();
137168
}

0 commit comments

Comments
 (0)