Skip to content

R package to handle Azure authentication and basic tasks with blob storage

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

The-Strategy-Unit/azkit

Repository files navigation

{azkit} 🌊🔑📂📦R

License: MIT Project Status: WIP – Initial development is in progress, but there has not yet been a stable release Lifecycle: experimental GitHub R package version R CMD check status

azkit badge

R package to handle Azure authentication and basic tasks with blob storage.

Status

The package is in development. Please leave an issue or raise a pull request if you have ideas for its improvement.

Installation

You can install the development version of {azkit} with:

# install.packages("pak")
pak::pak("The-Strategy-Unit/azkit")

Usage

A primary function in {azkit} enables access to an Azure blob container:

data_container <- azkit::get_container()

Authentication is handled "under the hood" by the get_container() function, but if you need to, you can explicitly return an authentication token for inspection or testing:

my_token <- azkit::get_auth_token()

The container returned will be set by the name stored in the AZ_CONTAINER environment variable, if any, by default, but you can override this by supplying a container name to the function:

custom_container <- azkit::get_container("custom")

Return a list of all available containers in your default Azure storage with:

list_container_names()

Once you have access to a container, you can use one of a set of data reading functions to bring data into R from .parquet, .rds, .json or .csv files:

pqt_data <- azkit::read_azure_parquet(data_container, "v_important_data")

The functions will try to match a file of the required type using the file name supplied. In the case above, "v_important_data" would match a file named "v_important_data.parquet", no need to supply the file extension.

By default the read_* functions will look in the root folder of the container. To specify a subfolder, supply this to the path argument. The functions will not search recursively into further subfolders, so the path needs to be full and accurate.

Or you may have "long" filenames that include the full notional path to the file, in which case you can ignore the "path" argument. Long filenames are returned by azkit::list_files(), for example.

azkit::list_files(data_container, "data/latest", "parquet") |>
  purrr::map(\(x) azkit::read_azure_parquet(data_container, x, info = FALSE))

If there is more than 1 file matching the string supplied to file argument, the functions will throw an error. Specifying the exact filename will avoid this of course - but shorter file arguments may be convenient in some situations.

Currently these functions only read in a single file at a time.

Setting the info argument to TRUE will enable the functions to give some confirmatory feedback on what file is being read in. You can also pass through arguments that will be applied to, for example, readr::read_delim(), such as col_types, as the function reads in a CSV file:

csv_data <- data_container |>
  azkit::read_azure_csv("vital_data.csv", path = "data", col_types = "ccci")

Environment variables

To access Azure Storage you will want to set some environment variables. The neatest way to do this is to include a .Renviron file in your project folder.

⚠️These values are sensitive and should not be exposed to anyone outside The Strategy Unit. Make sure you include .Renviron in the .gitignore file for your project.

Your .Renviron file should contain the variables below. Ask a member of the Data Science team for the necessary values.

# essential
AZ_STORAGE_EP=
# useful but not absolutely essential:
AZ_CONTAINER=

# optional, for certain authentication scenarios:
AZ_TENANT_ID=
AZ_CLIENT_ID=
AZ_APP_SECRET=

These may vary depending on the specific container you’re connecting to.

For one project you might want to set the default container (AZ_CONTAINER) to one value, but for a different project you might be mainly working with a different container so it would make sense to set the values within the .Renviron file for each project, rather than globally for your account.

Getting help

Please use the Issues feature on GitHub to report any bugs, ideas or problems, including with the package documentation.

Alternatively, to ask any questions about the package you may contact Fran Barton.

Development

If you wish to clone this package for development, including running the included tests, you will want some further environment variables for your local .Renviron. Contact Fran if you need help with this.

About

R package to handle Azure authentication and basic tasks with blob storage

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Packages

No packages published

Languages