Code assignment: document search

This assignment walks through a process of turning a simple Python application into a more production-ready app.

The task list may be long, so if there's anything you're struggling with, you're welcome to skip ahead. We don't need you to check items on a shopping list of tech skills, rather we want to focus on how you work on the things you're already familiar with.

Getting started

This assignment makes use of VS Code's remote development features, so that your development environment requires no toil to set up, and is consistent regardless of operating system or other dependencies.

System Requirements

You need only two components:

A container runtime like Docker Desktop
Visual Studio Code

Starting the development environment

Using Nexar's repository as a template, create a new repository under your own Github account, then clone that repo into a local directory, e.g. $HOME/nexar-assignment. Using the GitHub CLI you can do this with a single command:
```
cd $HOME
gh repo create \
   nexar-assignment \
   --template https://github.com/getnexar/infra-eng-assignment.git \
   --private # optional \
   --confirm # optional
```
Open your working copy in VS Code (e.g. code $HOME/nexar-assignment in macOS). A prompt should pop up asking you to open the directory in a container. Click 'Yes', let it build the dev container (takes 1-3 minutes), and once it's done, you're good to go.

Background

The doc-search application implements a simple search endpoint over a set of documents. More specifically, given a dataset of documents, where each document has a numeric identifier, the endpoint returns a list of all of the document IDs containing ALL words in the q query parameter.

For example, if the web server is serving at http://localhost:8080, and the words hello and world both exist only in document 1, then the command curl http://localhost:8080/?q=hello+world should return:

{
    "results": ["1"]
}

Sanity checks

Run unit tests cd doc-search/src/ && python -m unittest -b test_index
Build the container image: docker build . -t doc-search
Run the app: docker run -p 8080:8080 doc-search.
Test the app: you can use curl to query it, for example: curl http://localhost:8080/?q=hello+world will return a JSON document with all of the documents containing both hello and world

Tasks

Part 1: Improving the build

The app currently has a Dockerfile included under doc-search/.

Every commit to application code (.py files) results in a slow build of the container image. Modify the Dockerfile to make the build faster.

Answer: moving the /src to a higher layer in the docker file will resolve this issue

How can you minimize the size of the resulting container image? Modify the Dockerfile or describe your solution.

Answer: changing the base image to alpine reduced the image to 56 MB, separating the dockerfile with multistage would have reduced it further to 50MB, i decided that an extra 6 MB wasnt worth the change, in other languages like GO the multistage would have worked better but python is a run time compiled language.

Part 2: Deploying to Kubernetes

Here you will deploy the application to a local Minikube.

Implement a minimal Helm chart for this application.
Deploy the chart to Minikube, under the default namespace.
Verify that you can call the service from outside the cluster.
We want Kubernetes to tolerate a slow start for our app. Implement this behavior in your chart. Bonus points if you can simulate a slow start and test your solution.

Answer: adding an init container that just waits 10 seconds simulated the slow start

Part 3: Observability

In the app's Python code, instrument latency of the search/ endpoint, and expose a metrics HTTP endpoint on port 8000. You may use any open-source library for this purpose.
Add code and/or configuration that installs Prometheus onto the k8s cluster and configures it to scrape metrics from the app.
Using a load generator like hey, generate some load on the app.
Using the built-in web UI for Prometheus, chart the p50, p90, p99 latencies of search/ requests over the load you generated before.
(Bonus) which other key metrics are important/useful to instrument in a web service like this? Add them as you see fit and show how you can query them in Prometheus.

Answer: deployed pr7s to minikube, but didnt instrument the python since exposing another metric endpoint meant adding another webserver, much easier to implement in GO. other key metrics are: Requests per seconds, error rate, and uptime

Good luck!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.devcontainer		.devcontainer
.vscode		.vscode
doc-search		doc-search
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Code assignment: document search

Getting started

System Requirements

Starting the development environment

Background

Sanity checks

Tasks

Part 1: Improving the build

Part 2: Deploying to Kubernetes

Part 3: Observability

About

Uh oh!

Releases

Packages

Languages

License

alllomancer/nexar-assignment

Folders and files

Latest commit

History

Repository files navigation

Code assignment: document search

Getting started

System Requirements

Starting the development environment

Background

Sanity checks

Tasks

Part 1: Improving the build

Part 2: Deploying to Kubernetes

Part 3: Observability

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages