Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
210 changes: 43 additions & 167 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,48 +1,34 @@
# cloudera.exe - Runlevel Management and Utilities for Cloudera Data Platform (CDP)
# cloudera.exe - Tools and Utilities for Cloudera Deployments

[![API documentation](https://github.com/cloudera-labs/cloudera.exe/actions/workflows/publish_docs.yml/badge.svg?branch=main&event=push)](https://github.com/cloudera-labs/cloudera.exe/actions/workflows/publish_docs.yml)

`cloudera.exe` is an Ansible collection that offers runlevel management of your **[Cloudera Data Platform (CDP)](https://www.cloudera.com/products/cloudera-data-platform.html) Public Cloud and Private Cloud** deployments. The collection contains a number of utilities for common scenarios encountered when managing a CDP deployment, including:
* Set up and management of external dependencies, e.g. database, Kerberos, LDAP
* Execution of Common deployment sequences (via importable playbooks)
`cloudera.exe` is an Ansible collection for helping setup **[Cloudera Data Platform (CDP)](https://www.cloudera.com/products/cloudera-data-platform.html) on cloud (Public Cloud) and on premise (Private Cloud)** deployments. The collection contains a number of utilities for common scenarios encountered when managing a deployment, including:

The collection is unabashedly an _opinionated_ approach of managing your CDP resources - it's resources can be used to set up your CDP infrastructure, configure the host machines, install and configure CDP and its services, and more. The collection interacts across several control planes from CDP Public Cloud and cloud provider endpoints to Cloudera Manager for CDP Private Cloud and Public Cloud Data Hubs. In short, it has opinions about how to get things done. If you are looking for automation resources that _only_ interact with CDP resources - that is, assets that are focused solely on Cloudera software - please look at [`cloudera.cloud` for Public Cloud](https://github.com/cloudera-labs/cloudera.cloud) and [`cloudera.cluster` for Private Cloud and Cloudera Manager](https://github.com/cloudera-labs/cloudera.cluster).
* Supporting databases
* Kerberos
* LDAP
* TLS
* Cloudera Manager Server and Agent binaries
* Repositories
* OS kernel best practices

Core to the collection is the [configuration file](./docs/configuration.yml) which many of the collection's roles use as a central "switchboard" for their functions. The collection works hand-in-hand with the [`cloudera-deploy` application](https://github.com/cloudera-labs/cloudera-deploy/) to execute _definitions_ which include variations on this configuration; many of the functions in `cloudera-deploy` have relocated to this collection to streamline its use.
The collection is unabashedly an _opinionated_ approach of managing your resources - its resources can be used to configure the host machines, install and configure Cloudera and its services, and more. To be clear, **these resources are OPTIONAL**, yet they are helpful for greenfield, developer-centric, and quickstart deployments. The collection interacts with cloud provider endpoints to host systems. In short, it has opinions about how to get things done.

The collection provides _playbooks_, _roles_, and _plugins_ for working with CDP deployments. Notably, the playbooks encapsulate typical set up and tear down deployment operations, aka runlevels:

| Name | Description |
| --- | --- |
| `pbc_infra_setup.yml` | Public Cloud infrastructure setup (AWS, Azure, GCP), using either Terraform or Ansible |
| `pbc_infra_teardown.yml` | Public Cloud infrastructure teardown (AWS, Azure, GCP), using either Terraform or Ansible |
| `pbc_setup.yml` | Public Cloud Datalake and Data Services setup |
| `pbc_teardown.yml` | Public Cloud Datalake and Data Services teardown |
| `pvc_base_postfix.yml` | Private Cloud setup, postfix |
| `pvc_base_prereqs_ext.yml` | Private Cloud external dependencies, e.g. JVM, Kerberos, database |
| `pvc_base_prereqs_int.yml` | Private Cloud internal dependencies, e.g. Cloudera Manager server and agent install |
| `pvc_base_setup.yml` | Private Cloud cluster setup |
| `pvc_base_teardown.yml` | Private Cloud cluster teardown |

`cloudera.exe`-powered applications, like `cloudera-deploy`, import these playbooks to enable these runlevel operations.

The other collection assets - the _roles_ and _plugins_ - are detailed in the [API documentation](https://cloudera-labs.github.io/cloudera.exe/). While these resource can be used separately, most expect the common configuration noted above and a sequence of execution defined within the noted playbooks.
If you are looking for automation resources that _only_ interact with Cloudera resources - that is, assets that are focused solely on Cloudera software - please look at [`cloudera.cloud`](https://github.com/cloudera-labs/cloudera.cloud) and [`cloudera.cluster`](https://github.com/cloudera-labs/cloudera.cluster).

## Quickstart

See the [API documentation](https://cloudera-labs.github.io/cloudera.exe/) for details for each plugin and role within the collection.

1. [Install the collection](#installation)
2. [Install the requirements](#requirements)
3. [Use the collection](#using-the-collection)

## API

See the [API documentation](https://cloudera-labs.github.io/cloudera.exe/) for details for each plugin and role within the collection.

## Roadmap

If you want to see what we are working on or have pending, check out:

* the [Milestones](https://github.com/cloudera-labs/cloudera.exe/milestones) and [active issues](https://github.com/cloudera-labs/cloudera.exe/issues?q=is%3Aissue+is%3Aopen+milestone%3A*) to see our current activity,
* the [Milestones](https://github.com/cloudera-labs/cloudera.exe/milestones) and [active issues](https://github.com/cloudera-labs/cloudera.exe/issues?q=is%3Aissue+is%3Aopen+milestone%3A*) to see our current activity,
* the [issue backlog](https://github.com/cloudera-labs/cloudera.exe/issues?q=is%3Aopen+is%3Aissue+no%3Amilestone) to see what work is pending or under consideration, and
* read up on the [Ideas](https://github.com/cloudera-labs/cloudera.exe/discussions/categories/ideas) we have in mind.

Expand All @@ -54,11 +40,16 @@ For more information on how to get involved with the `cloudera.exe` Ansible coll

## Installation

To install the `cloudera.exe` collection, you have several options. Please note that to date, we have not yet published this collection to the public Ansible Galaxy server, so you cannot install it via direct namespace declaration, rather you must specify a Git project and (optionally) branch.
To install the `cloudera.exe` collection, you have several options.

### Option #1: Install from GitHub
The preferred method is to install via Ansible Galaxy; in your `requirements.yml` file, add the following:

Create or edit the `requirements.yml` file in your project with the
```yaml
collections:
- name: cloudera.exe
```

If you want to install from GitHub, add to your `requirements.yml` file:
following:

```yaml
Expand All @@ -77,92 +68,15 @@ ansible-galaxy collection install -r requirements.yml
You can also install the collection directly:

```bash
ansible-galaxy collection install git+https://github.com/cloudera-labs/cloudera.exe.git@main
# From Ansible Galaxy
ansible-galaxy collection install cloudera.exe
```

### Option #2: Install the tarball

Periodically, the collection is packaged into a distribution which you can
install directly:

```bash
ansible-galaxy collection install <collection-tarball>
# From GitHub
ansible-galaxy collection install git+https://github.com/cloudera-labs/cloudera.exe.git@main
```

See [Building the Collection](#building-the-collection) for details on creating a local tarball.

## Requirements

The `cloudera.exe` expects `ansible-core>=2.10,<2.13`.

> [!WARNING]
> The current functionality of the `cloudera.cluster` dependency does not yet work with Ansible version `2.13` and later.

The collection has the following _required_ dependencies:

| Name | Type | Version |
|------|------|---------|
| `cloudera.cloud` | collection | `main` |
| `cloudera.cluster` | collection | `main` |
| `ansible.netcommon` | collection | `2.5.1` |
| `community.general` | collection | `4.5.0` |

You will need to add the following, depending on your target deployment, but all are collectively _optional_ dependencies:

**Private Cloud**

See the [requirements for `cloudera-labs/cloudera.cluster`](https://github.com/wmudge/cloudera.cluster#requirements) for details.

| Name | Type | Version |
|------|------|---------|
| `community.mysql` | collection | `3.8.0` |
| `community.postgresql` | collection | `3.3.0` |
| `freeipa.ansible_freeipa` | collection | `1.11.1` |
| `geerlingguy.postgresql` | role | `3.3.0` |
| `geerlingguy.mysql` (patched) | role | `master` |

**Terraform**

If you intend to use Terraform as your infrastructure engine within the `cloudera.exe.infra` role, then install the following:

| Name | Type | Version |
|------|------|---------|
| `cloud.terraform` | collection | `1.1.1` |

**AWS**

See the [AWS Execution Environment configuration](https://github.com/cloudera-labs/cldr-runner/blob/main/aws/execution-environment.yml) in `cloudera-labs/cldr-runner` for details on setting up the Python and system requirements.

| Name | Type | Version |
|------|------|---------|
| `amazon.aws` | collection | `3.0.0` |
| `community.aws` | collection | `3.0.1` |

**Azure**

See the [Azure Execution Environment configuration](https://github.com/cloudera-labs/cldr-runner/blob/main/azure/execution-environment.yml) in `cloudera-labs/cldr-runner` for details on setting up the Python and system requirements.

| Name | Type | Version |
|------|------|---------|
| `azure.azcollection` | collection | `1.11.0` |
| `netapp.azure` | collection | `21.10.0` |

**GCP**

See the [GCP Execution Environment configuration](https://github.com/cloudera-labs/cldr-runner/blob/main/gcp/execution-environment.yml) in `cloudera-labs/cldr-runner` for details on setting up the Python and system requirements.

| Name | Type | Version |
|------|------|---------|
| `google.cloud` | collection | `1.0.2` |

The collection also requires the following Python libraries to operate its modules and tasks:

* [netaddr](https://pypi.org/project/netaddr/)

The collection's Python dependencies alone, _not_ the required Python libraries of its collection dependencies, are in `requirements.txt`.

All collection dependencies, required and optional, can be found in `requirements.yml`; only the _required_ **non-Cloudera** dependencies are in `galaxy.yml`. `ansible-galaxy` will install only the _required_ **non-Cloudera** collection dependencies; you will need to add `cloudera.cloud`, `cloudera.cluster`, and the _optional_ collection dependencies as needed (see above).

`ansible-builder` can discover and install all Python dependencies - current collection and dependencies - if you wish to use that application to construct your environment. Otherwise, you will need to read each collection and role dependency and follow its installation instructions.

See the [Collection Metadata](https://ansible.readthedocs.io/projects/builder/en/latest/collection_metadata/) section for further details on how to install (and manage) collection dependencies.
Expand All @@ -171,76 +85,38 @@ You may wish to use a _virtual environment_ to manage the Python dependencies.

## Using the Collection

This collection is designed to work hand-in-hand with the [`cloudera-deploy` application](https://github.com/cloudera-labs/cloudera-deploy), which uses the reference playbooks in the `playbooks` directory to drive the operations of its example definitions.
This collection is designed to help you get up and running with Cloudera on cloud and on premise. It is decidedly _opinionated_ -- that is, these roles and plugins make assumes as to how certain configurations and requirements are met. **THESE RESOURCES ARE COMPLETELY OPTIONAL. THEY EXIST ONLY TO ASSIST YOU WITH BOOTSTRAPPING AND GREENFIELD EXAMPLES!**

Feel free to use these resources as needed!

Once installed, reference the collection in your playbooks and roles.
Once installed, reference the collection in playbooks and roles.

For example, here we use the
[`cloudera.exe.init_deployment` role](https://github.com/cloudera-labs/cloudera.exe/tree/main/roles/init_deployment) to read the configuration details and then import the Public Cloud playbooks to set up and provision an Environment and Datalake:
[`cloudera.exe.cm_agent` role](https://github.com/cloudera-labs/cloudera.exe/tree/main/roles/cm_agent) to download and install the Cloudera Manager agent software from the Cloudera Archive repository:

```yaml
- name: Marshal the variables
hosts: localhost
connection: local
- name: Install the CM agent
hosts: cluster_hosts
gather_facts: yes
tasks:
- name: Read definition variables
ansible.builtin.include_role:
name: cloudera.exe.init_deployment
public: yes
when: init__completed is undefined
tags:
- always

- name: Set up CDP Public Cloud infrastructure (Ansible-based)
ansible.builtin.import_playbook: cloudera.exe.pbc_infra_setup.yml

- name: Set up CDP Public Cloud (Env and DL example)
ansible.builtin.import_playbook: cloudera.exe.pbc_setup.yml
- name: Install the agent and register with Cloudera Manager
ansible.builtin.import_role:
name: cloudera.exe.cm_agent
vars:
cloudera_manager_host: cm.example.internal
```

> [!IMPORTANT]
> You **must** run `cloudera.exe.init_deployment` before calling any of the collection's playbooks. This call must occur within the source project, otherwise Ansible's `playbook_dir` will change to the collection's installation directory and variable lookups might not work as expected.

### Legacy Execution Modes

> [!WARNING]
> These documents and their modes of operation are deprecated in version 2.x. For example, the use of Ansible tags to trigger coarse runlevels have been replaced by explicit playbook execution. However, the "inner" tag structures still remain and might be relevant to some execution modes.

See the [execution examples](docs/runlevels.md#execution) in the Deployment Runlevels document.

For more information on the collection, check out:

+ [Configuration Guide](docs/configuration.md)
+ [Runlevels Guide](docs/runlevels.md)
+ [Architecture and Design Guide](docs/design.md)
## Building the API Documentation

## Building the Collection
If you wish to create a local copy of the API documentation, first set up the `hatch` build tool, as shown in the [TESTING](./TESTING.md) guide.

To create a local collection tarball, run:
Then, you can run:

```bash
ansible-galaxy collection build
hatch run docs:build
```

## Building the API Documentation

To create a local copy of the API documentation, first make sure the collection is in your `ANSIBLE_COLLECTIONS_PATHS`. Then run the following:

```bash
# change into the /docsbuild directory
cd docsbuild

# install the build requirements (antsibull-docs); you may want to set up a
# dedicated virtual environment
pip install ansible-core https://github.com/cloudera-labs/antsibull-docs/archive/cldr-docsite.tar.gz

# Install the collection's build dependencies
pip install requirements.txt

# Then run the build script
./build.sh
```
This will kick off the build toolchain. The local documentation can be found in `docsbuild/build/html`.

## License and Copyright

Expand Down
Loading