Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .markdownlint.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,7 @@
"MD007": { "indent": 4 },
"no-hard-tabs": false,
"MD013": false,
"MD026": { "punctuation": ".,;:!" }
}
"MD026": { "punctuation": ".,;:" },
"MD040": false,
"MD046": false
}
2 changes: 2 additions & 0 deletions .markdownlintignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
docs/wiki-guide/HF_*_Template*.md
mkdocs.yaml
20 changes: 16 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,8 @@ The project includes custom MkDocs macros defined in `main.py`:
2. **Create a feature branch** from `dev`
3. **Make your changes** following the standards above
4. **Test locally** with `mkdocs serve`
5. **Run linting** to ensure formatting consistency
5. **Run linting (OPTIONAL)** to ensure formatting consistency
- See instructions in [Linting](#linting)
6. **Submit a pull request** with:
- Clear description of changes
- Reference to related issue
Expand Down Expand Up @@ -178,9 +179,20 @@ chore: update mkdocs dependencies

The project uses [markdownlint](https://github.com/DavidAnson/markdownlint) with configuration in `.markdownlint.json`. Key settings:

- 4-space indentation for lists (`MD007`)
- No hard tab restrictions disabled
- Line length restrictions disabled (`MD013`)
- 4-space indentation for lists (`MD007`).
- No hard tab restrictions disabled.
- Line length restrictions disabled (`MD013`).
- Restrict punctuation in headers (`MD026`); allow `!` and `?`.
- Allowed code blocks without language specification (`MD040`).
- Allow fenced code blocks, as this commonly errors when indented (see [discussion](https://github.com/DavidAnson/markdownlint/issues/327)).

For faster PR review, you may want to run linting locally; we do have a PR Action in place as well. First install markdownlint, then run

```console
markdownlint -c .markdownlint.json -f docs/wiki-guide/
```

The `-f` resolves simple formatting issues, and alerts will be raised for more complicated linter style rules (e.g., referencing a link as `[here](URL)` will produce the line: `<filename>.md:191:2 MD059/descriptive-link-text Link text should be descriptive [Context: "[here]"]`).

### Content Review

Expand Down
11 changes: 9 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Welcome to the Collaborative Distributed Science Guide!

Just joining or starting a new project?
Just joining or starting a new project?
Check out the [Collaborative Distributed Science Guide](https://imageomics.github.io/Collaborative-distributed-science-guide/) for guidance on conventions and best practices.

## About the Guide
Expand All @@ -14,6 +14,7 @@ Please feel free to open an [issue](https://github.com/Imageomics/Collaborative-
### How to Use the Guide

This Guide is set up as a template repository such that there are two primary means of interacting with it:

1. Building a personalized version of the Guide: select "Use this Template" at the top of the repo to generate your own version. This will create a new repository (generated from the template repo) that does _not_ share the commit history of the template. Updates could still be added from the template upstream through [`git cherry-pick`](https://git-scm.com/docs/git-cherry-pick). More details are provided below, in [Personalizing the Guide](#personalizing-the-guide).
2. Contributing to the Guide: fork this repo, make changes, and submit a pull request (PR) for review. Some guidance is provided in the [Pull Request Guide](https://imageomics.github.io/Collaborative-distributed-science-guide/wiki-guide/The-GitHub-Pull-Request-Guide/); please provide a detailed description of your changes and review the contributing guidelines (coming soon).

Expand All @@ -37,17 +38,23 @@ Primary pages to personalize are:
If you'd like to contribute to this guide, please read our [Contributing Guidelines](CONTRIBUTING.md) for information about our standards, development workflow, and submission process.

### Testing

To test this site locally, first clone this repository, then create an environment with `requirements.txt`

```
pip install -r requirements.txt
```

and run `mkdocs serve`:

```
mkdocs serve
```
Then the site will run at http://127.0.0.1:8000/Collaborative-distributed-science-guide/.

Then the site will run at <http://127.0.0.1:8000/Collaborative-distributed-science-guide/>.

### History

This guide was developed alongside the [Imageomics Guide](https://imageomics.github.io/Imageomics-guide/), which houses the information needed to get started with and use institute resources readily available to all members. However, most of its content is applicable to anyone working more broadly in the field of [_imageomics_](https://imageomics.github.io/Collaborative-distributed-science-guide/wiki-guide/Glossary-for-Imageomics.md/#imageomics) or adjacent fields of computer and data science, and it is tailored to help domain scientists bridging that gap. _This guide_ is intended to serve as a template for others wishing to develop a similar organization-specific guide, and this solution was born out of the desire to do so for the [AI and Biodiversity Change (ABC) Global Center](http://abcresearchcenter.org) while limiting duplicative updates between guides (Imageomics and ABC share some team members on this project). The general structure of the website should be broadly applicable, but see [Personalizing the Guide](#personalizing-the-guide) for suggestions on tailoring it for the particular organization or group's needs.

## Acknowledgments
Expand Down
18 changes: 13 additions & 5 deletions docs/CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@ To this end, we agree as individuals and as a group to:

- **Listen to understand.** When one person talks, others listen.
- **Speak to be understood.** We use lay terms and are patient with people who are not experts in our specific field. We are all learning, no matter who we are.
- Embrace **“Yes and…”** Focus on possibilities instead of obstacles. Be inclusive of other people’s ideas. Honor divergence.
- Embrace **“Yes and…”** Focus on possibilities instead of obstacles. Be inclusive of other people’s ideas. Honor divergence.
- **Take space / make space.** Those who tend to talk a lot are intentional about letting others talk first, while those who tend to hold back are intentional about contributing.
- **Beware of blind spots.** We do not know what we do not know. We are vigilant for differences among our experiences and positions.
- **Respect time.** When a session is over, we need to move on. There is designated time for in-depth follow up and continuing conversations.
- **Care** for each other. We bring our full selves to the community, and we look out for each other wholeheartedly.
- **Beware of blind spots.** We do not know what we do not know. We are vigilant for differences among our experiences and positions.
- **Respect time.** When a session is over, we need to move on. There is designated time for in-depth follow up and continuing conversations.
- **Care** for each other. We bring our full selves to the community, and we look out for each other wholeheartedly.

We abide by these principles in all Imageomics and ABC spaces, including but not limited to digital and in-person meetings, formal and informal gatherings, online discussion forums and chat spaces, and field and lab work.
We abide by these principles in all Imageomics and ABC spaces, including but not limited to digital and in-person meetings, formal and informal gatherings, online discussion forums and chat spaces, and field and lab work.

Acts of misconduct are prohibited. Those found to engage in misconduct will be subject to dismissal from the project and further actions as directed by the guidelines of the employers and the place of incidence.

Expand All @@ -25,11 +25,19 @@ If you believe you have experienced or witnessed misconduct in an Imageomics or
Privacy will be protected to the greatest extent possible.

## VALUES

### TRANSPARENCY

We ensure our efforts are clear about assumptions, uncertainty, and limits, and provide open sources of information, processes, and discovery.

### ACCOUNTABILITY

We are responsible, individually and collectively, for the outcomes we produce and ensure, to the best of our abilities, that the methods outcome matches intended use.

### COLLABORATION

We create and nurture collaborative environments and welcome, value, and affirm all members of our community. We also consider how and for whom solutions are created and promote the heterogeneity of perspectives in the creation process. We actively engage others’ perspectives, recognize everyone’s potential to contribute new ideas, and work together to find creative solutions to complex problems.

### SAFETY

We ensure our practices are ethical and impartial to the best of our ability. We address ethical issues when we discover them and practice good data governance. We strive to enhance practices while openly addressing those that harm people or the environment.
4 changes: 1 addition & 3 deletions docs/wiki-guide/About-Templates.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,12 @@

We provide Dataset and Model Card templates for both Imageomics and ABC, adapted from Hugging Face's templates. The Imageomics and ABC templates include guidance and examples for the various metadata sections, reference information for Hugging Face's particular flavor of markdown, and the appropriate NSF & NSERC grant acknowledgment.

To use a template for a new dataset or model repository on Hugging Face (HF), simply copy and paste the contents of the appropriate template ([Dataset Card](HF_DatasetCard_Template_mkdocs.md) or [Model Card](HF_ModelCard_Template_mkdocs.md)) into your `README.md` file.[^1]
To use a template for a new dataset or model repository on Hugging Face (HF), simply copy and paste the contents of the appropriate template ([Dataset Card](HF_DatasetCard_Template_mkdocs.md) or [Model Card](HF_ModelCard_Template_mkdocs.md)) into your `README.md` file.[^1]
Then, follow the descriptions under each section to fill in the appropriate information. This is meant to be an iterative process throughout the life of your project, so do not worry if you cannot answer all parts at the beginning&mdash;that's to be expected!
[^1]: The templates can also be added to your repository thorugh the website user interface (UI): Navigate to the "Model/Dataset Card" tab on your repo, select "Create Model/Dataset Card", copy and paste the template contents into the `README.md` file, and add your content.


!!! tip "Practice makes perfect!"
If you have never filled out a dataset card before, or are unsure of how to find the answers to fill in the sections, we ran a [workshop](https://github.com/Imageomics/data-workshop-AH-2024) to help familiarize our members with this process. In particular, the portion where we walked through filling out part of a dataset card as we did exploratory data analysis (EDA) was recorded and is available on the [Imageomics YouTube Channel](https://www.youtube.com/@ImageomicsInstitute/videos). Read the [story of the workshop](https://github.com/Imageomics/data-workshop-AH-2024/#story-of-the-workshop) and clone the [repo](https://github.com/Imageomics/data-workshop-AH-2024) to follow along with the 1 hour and 15 minute lesson!

!!! note "Note"
The Dataset and Model cards have incorporated some of Hugging Face's January 2024 updates (following their [Dataset Card Overhaul](https://github.com/huggingface/huggingface_hub/commit/6dd7ee829bd1b1216663a9993c1943c29b64690a)). It doesn't appear they will be updated more and we do not currently anticipate further large updates on our end as our overall template formats have diverged. Nevertheless, you may wish to check HF for extra information or tagging updates ([HF Dataset Card](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md), [HF Model Card](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md)).

4 changes: 2 additions & 2 deletions docs/wiki-guide/Code-Checklist.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ This checklist provides an overview of essential and recommended elements to inc
- [ ] Acknowledge source code dependencies and contributors.
- [ ] Reference related datasets used in training or evaluation.
- [ ] **Requirements File**: Provide a [file detailing software requirements](GitHub-Repo-Guide.md/#software-requirements-file), such as a `requirements.txt` or `pyproject.toml` for Python dependencies.
- [ ] **Gitignore File**: GitHub has premade `.gitignore` files ([here](https://github.com/github/gitignore)) tailored to particular languages (eg., [R](https://github.com/github/gitignore/blob/main/R.gitignore) or [Python](https://github.com/github/gitignore/blob/main/Python.gitignore)), operating systems, etc.
- [ ] **Gitignore File**: GitHub has premade `.gitignore` files (see [github/gitignore](https://github.com/github/gitignore)) tailored to particular languages (eg., [R](https://github.com/github/gitignore/blob/main/R.gitignore) or [Python](https://github.com/github/gitignore/blob/main/Python.gitignore)), operating systems, etc.
- [ ] **CITATION CFF**: This facilitates citation of your work, follow guidance provided in the [Repo Guide](GitHub-Repo-Guide.md/#citation).

### Data-Related
Expand Down Expand Up @@ -81,7 +81,7 @@ The [Repo Guide](GitHub-Repo-Guide.md/) provides general guidance on repository
### Documentation

- [ ] **API Documentation**: Generate API documentation (e.g., [`MkDocs`](https://www.mkdocs.org) for Python or wiki pages in the repo).
- [ ] **Docstrings**: Add comprehensive docstrings for all functions, classes, and modules. These can be incorporated to help generate documentation. Note that generative AI tools with access to your code, such as GitHub Copilot, can be quite accurate in generating these, especially if you are using type annotations.
- [ ] **Docstrings**: Add comprehensive docstrings for all functions, classes, and modules. These can be incorporated to help generate documentation. Note that generative AI tools with access to your code, such as GitHub Copilot, can be quite accurate in generating these, especially if you are using type annotations.
- [ ] **Example Scripts**: Include example scripts for common use cases.
- [ ] **Configuration Files**: Use `yaml`, `json`, or `ini` for configuration settings.

Expand Down
18 changes: 13 additions & 5 deletions docs/wiki-guide/Command-Line-Cheat-Sheet.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
# Command Line Cheat Sheet

<!-- Disable rule for in-line HTML; it is needed for better table formatting and brackets aren't recognized as being in a code block by the linter on this page. -->
<!-- markdownlint-disable MD033 -->
See also [GitHub's Markdown Guide](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax).

## Useful bash and git

<!-- The pipe in the final row of this column isn't recognized as being in a code block by the linter
so this rule must be ignored here to avoid errors -->
<!-- markdownlint-disable MD056 -->
| Command | Action |
| --- | --- |
| `<cmd> -h` | print the help documentation for a command, showing usage information and options |
Expand All @@ -11,12 +16,14 @@ See also [GitHub's Markdown Guide](https://docs.github.com/en/get-started/writin
| `pwd` | current working directory |
| `ls` | list everything in current directory (use `-a` to also show **a**ll files including hidden, `-l` for a **l**ong list including permissions and ownership info, `-1` ("dash one") to display the output with **1** item on each line) |
| `wc -l <file>` | use the **w**ord **c**ount command with the `-l` **l**ines option to list the number of lines in a file |
| `du <dirname>/`| calculate and show how much **d**isk **u**sage is consumed by a directory (use `-h` to make it **h**uman-readable, i.e. report in MB, GB or whatever units are most appropriate, and `-s` for **s**ummary of all the contents together rather than each item individually) |
| `du <dirname>/`| calculate and show how much **d**isk **u**sage is consumed by a directory (use `-h` to make it **h**uman-readable, i.e. report in MB, GB or whatever units are most appropriate, and `-s` for **s**ummary of all the contents together rather than each item individually) |
| ++ctrl+r++ | search for command (will pop up `bck-i-search:`) |
| `rm <target>` | remove a file (or folder with `-r`). Beware when using `rm -rf <folder>` to **f**orce the **r**ecursive removal of all contents in a folder, which cannot be undone unless there is a backup. |
| `<cmd1> | <cmd2>` | The "pipe" operator (++pipe++) feeds the output of the first command (`cmd1`) to the input of the second command (`cmd2`). For example, show the total number of files in a directory with `ls -1 <dir> | wc -l`|
| `<cmd1> | <cmd2>` | The "pipe" operator (++pipe++) feeds the output of the first command (`cmd1`) to the input of the second command (`cmd2`). For example, show the total number of files in a directory with `ls -1 <dir> | wc -l` |
<!-- markdownlint-enable MD056 -->

### Git-Specific

| Command | Action |
| --- | --- |
| `git log` | list of commits with author, date, time (type `q` to leave) |
Expand All @@ -30,10 +37,11 @@ See also [GitHub's Markdown Guide](https://docs.github.com/en/get-started/writin
| `git branch -d <branch>` | delete branch |

!!! tip "Pro tip: Simplify your git history"
- Use `git mv` to rename a file so that it is tracked as a rename (with or without changes).
- Use `git mv` to rename a file so that it is tracked as a rename (with or without changes).
- If you rename a file then `git add` its parent directory, the diff will show the deletion of the original file and addition of a "completely new" file, even if nothing has changed. This makes reviewing changes much more complicated than necessary.

#### Usual Process

After making changes to a file on a branch, check the status of your current working branch (with `git status`). Then, you "add" the file, state what is new about the file ("commit the change"), and `push` the file from your local copy of the repo to the remote copy:

```bash
Expand All @@ -42,14 +50,14 @@ git add <filename>
git commit -m "Changed x,y,z"

git push

```

!!! tip "Pro tip: Check the stage"
After using `git add <folder>` or `git add <regex>` (a pattern match), run `git status` to ensure that all intended files--and ***only*** intended files--are staged for commit.

!!! note Note
If you need to update your branch with changes from the remote `main`, first switch to the branch, then set pull from `main` instead of the current branch, as below.

```bash
git checkout <branch>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ This means the following policy applies for digital products of the Imageomics I

2. Code is to be released under an [OSI-approved open source license](https://opensource.org/licenses/), or to the public domain (for example, by applying a [CC-Zero](https://creativecommons.org/publicdomain/zero/1.0/) waiver).

- This should be in a well-documented GitHub repository that follows the format specified in the [Institute GitHub Repo Guide](GitHub-Repo-Guide.md).
- This should be in a well-documented GitHub repository that follows the format specified in the [Institute GitHub Repo Guide](GitHub-Repo-Guide.md).

- If associated with a publication, code should be versioned with a release linked to a DOI that can be referenced in the publication.

Expand Down
Loading