Skip to content

refactor: introduce monorepo-like layout #323

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
May 30, 2024
Merged

refactor: introduce monorepo-like layout #323

merged 24 commits into from
May 30, 2024

Conversation

bajtos
Copy link
Member

@bajtos bajtos commented May 28, 2024

Introduce the following repository layout:

  • bin contains scripts shared by all components, currently bin/migrate.js
  • migrations contains SQL migrations scripts and index.js for running them
  • each component has its own directory with its own fly.toml
    • logshipper
    • api
    • spark-db
    • publish
  • there is a single Dockerfile shared by all workspaces
  • publish and api are npm workspaces with their own package.json file
  • the top-level package.json file is a npm workspace root

@bajtos bajtos requested a review from juliangruber May 28, 2024 14:46
Copy link
Member

@juliangruber juliangruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress, Miro! 👏

README.md Outdated
@@ -89,13 +89,13 @@ docker run -d --name spark-db \
Start the API service:

```bash
npm start
npm run start:api
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only workflow downside that I see with this approach is that you can't as easily zoom in into projects any more: If I open my editor on ./spark-api/, I need to keep my CWD in . and remember to run npm run test:api. It would be nice if I could cd into that folder and just run npm test. Is this a concern for you?

What about instead we give every subrepo their own package.json and node_modules? We can still have a parent package.json on the top level, which will shell out to the sub-parent.jsons for running tests etc. I think this matches the lerna experience closer as well. But are there downsides to this approach?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only workflow downside that I see with this approach is that you can't as easily zoom in into projects any more: If I open my editor on ./spark-api/, I need to keep my CWD in . and remember to run npm run test:api. It would be nice if I could cd into that folder and just run npm test. Is this a concern for you?

Makes sense. I agree we should at least try to make this workflow easier.

I usually open my editor/IDE in the monorepo root, so this is not a concern to me.

My thoughts:

  • When you run npm in the subdirectory and there is no package.json file, npm will walk up the directory tree to the monorepo root and use the package.json file in the root.
  • However, npm test will set the working directory to the directory containing the package.json file, so it will run all tests.
  • You can run something like npx mocha test, but that's ugly.
  • You can also run npm run test:spark-api, but that's not ideal either.

We can create a small package.json file containing only the scripts:

{
  "type": "module",
  "scripts": {
    "test": "cd ..; npm run test:spark-api"
  }
}

Do you see any downsides with that?

What about instead we give every subrepo their own package.json and node_modules? We can still have a parent package.json on the top level, which will shell out to the sub-parent.jsons for running tests etc. I think this matches the lerna experience closer as well. But are there downsides to this approach?

Yes, this is the Lerna/npm workspace setup.

From what I remember, this setup makes many tasks slower:

  • npm install needs to install multiple copies of the same package when it's used by multiple components.
  • Depending on the install strategy, you can end up with all packages installed in the root node_modules folder. As a result, the code in spark-publish can import dependencies declared in spark-api only and we won't notice that.
  • We probably need to configure TypeScript with a multi-project setup. The last time I used such a setup in a large monorepo (~2021), it took seconds to minutes for tsc just to load the entire tree and decide what needs to be (re)checked.

Anyhow, I am open to giving this a try.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation on different install strategies for npm workspaces:

https://docs.npmjs.com/cli/v10/commands/npm-ci?v=true#configuration

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened a PR against this feature branch to try out npm workspaces: #326

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see any downsides with that?

The downside would be that if you add a dependency from the subdirectory, it won't be added to the right place.

Yes, this is the Lerna/npm workspace setup.
From what I remember, this setup makes many tasks slower:

  • npm install needs to install multiple copies of the same package when it's used by multiple components.

To me, npm install time doesn't matter these days. I don't run it often, and the cache + my internet are fast enough. If it's a significant issue on your setup, or for your workflow, I'm happy to value this more.

  • Depending on the install strategy, you can end up with all packages installed in the root node_modules folder. As a result, the code in spark-publish can import dependencies declared in spark-api only and we won't notice that.

In which situation would all packagers be installed in the root folder? I believe that when you run cd api && npm install and there's api/package.json, then dependencies have to go to abi/node_modules/.

  • We probably need to configure TypeScript with a multi-project setup. The last time I used such a setup in a large monorepo (~2021), it took seconds to minutes for tsc just to load the entire tree and decide what needs to be (re)checked.

Ah right I remember that from our talk. I definitely don't want to introduce TS headaches here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, npm install time doesn't matter these days. I don't run it often, and the cache + my internet are fast enough. If it's a significant issue on your setup, or for your workflow, I'm happy to value this more.

I kind of resigned on slow npm installs, until I had a chance to experience how fast is pnpm and ever since then I am missing its speed. Switching from npm to pnpm is out of scope of this work though.

  • Depending on the install strategy, you can end up with all packages installed in the root node_modules folder. As a result, the code in spark-publish can import dependencies declared in spark-api only and we won't notice that.

In which situation would all packagers be installed in the root folder? I believe that when you run cd api && npm install and there's api/package.json, then dependencies have to go to abi/node_modules/.

npm's default configuration is to hoist dependencies. It will always install as many dependencies as possible in the root node_module folder.

install-strategy

Default: "hoisted"
Type: "hoisted", "nested", "shallow", or "linked"
Sets the strategy for installing packages in node_modules. hoisted (default): Install non-duplicated in top-level, and duplicated as necessary within directory structure. nested: (formerly --legacy-bundling) install in place, no hoisting. shallow (formerly --global-style) only install direct deps at top-level. linked: (experimental) install in node_modules/.store, link in place, unhoisted.

  • We probably need to configure TypeScript with a multi-project setup. The last time I used such a setup in a large monorepo (~2021), it took seconds to minutes for tsc just to load the entire tree and decide what needs to be (re)checked.

Ah right I remember that from our talk. I definitely don't want to introduce TS headaches here.

We also don't have TypeScript checks configured in this repository yet. I am willing to take a risk and not worry about this until I start working on adding tsc.

I think we should be fine as long as our monorepo stays small (less than 10 sub-packages, less than 100 source files).

Signed-off-by: Miroslav Bajtoš <[email protected]>
@bajtos
Copy link
Member Author

bajtos commented May 29, 2024

Next steps:

bajtos added 3 commits May 30, 2024 08:58
* feat: npm workspaces

Signed-off-by: Miroslav Bajtoš <[email protected]>

* fix CI + move `standard` to monorepo root

Signed-off-by: Miroslav Bajtoš <[email protected]>

* monorepo package name

Signed-off-by: Miroslav Bajtoš <[email protected]>

* ci: run CI for all pull requests

Signed-off-by: Miroslav Bajtoš <[email protected]>

* move mocha to sub-packages

Signed-off-by: Miroslav Bajtoš <[email protected]>

* add standard to child workspaces

Signed-off-by: Miroslav Bajtoš <[email protected]>

* fix CI step to migrate the DB

Signed-off-by: Miroslav Bajtoš <[email protected]>

* Rework Dockerfile to use `npm start --workspace`

Signed-off-by: Miroslav Bajtoš <[email protected]>

* ci: fix build-publish

Signed-off-by: Miroslav Bajtoš <[email protected]>

* fix dependabot

Signed-off-by: Miroslav Bajtoš <[email protected]>

* another fix for Dependabot

Signed-off-by: Miroslav Bajtoš <[email protected]>

* fix build-publish to test publish, not api

Signed-off-by: Miroslav Bajtoš <[email protected]>

* add postgrator to spark-publish dev-dependencies

Signed-off-by: Miroslav Bajtoš <[email protected]>

* minor fixes

Signed-off-by: Miroslav Bajtoš <[email protected]>

---------

Signed-off-by: Miroslav Bajtoš <[email protected]>
@bajtos bajtos marked this pull request as ready for review May 30, 2024 07:43
@bajtos bajtos merged commit 8bc6151 into main May 30, 2024
8 checks passed
@bajtos bajtos deleted the monorepo branch May 30, 2024 08:14
bajtos added a commit to CheckerNetwork/spark-stats that referenced this pull request May 30, 2024
Rework the repository to be ready to add spark-observer as the second
service.

See CheckerNetwork/spark-api#323

Signed-off-by: Miroslav Bajtoš <[email protected]>
bajtos added a commit to CheckerNetwork/spark-stats that referenced this pull request May 30, 2024
Rework the repository to be ready to add spark-observer as the second
service.

See CheckerNetwork/spark-api#323

Signed-off-by: Miroslav Bajtoš <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants