Skip to content

pashazm/datareport

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

datareport

datareport produces several aggregates by incoming adtech-related data. Each aggregate is folder with one CSV file in it.

Setup

Config file application.conf is required in working folder for application run. Config in HOCON format. Entries are:

  • master - Spark master
  • input - section with two children:
    • file-path - path to input CSV file
    • data-schema - Schema DDL string used for CSV file, optional
  • output-root - root folder for store output aggregates

This example is present in project for local runs from IDE:

master: "local[4]"
input {
    file-path: Dataset.csv
}
output-root: output

Structure

Entry point is AdAnalyticsApp.scala, top level logic in AdAnalyticsService.scala

Also present several main lower-level classes, with specific functionality

Main functionality (except of entry point) is covered by testcases.

How to run main code in IDE

How to run testcases from command line

By command:

sbt clean test

Coverage:

sbt clean coverage test coverageReport

Note: current coverage is 98%

Challenges

  • Bonus task was not implemented, because the current code implementation took significant time;
  • Requirement Produce summaries or visualizations from the aggregated data looks strange. Spark is used for data generation; for visualization other tools can be used. As a result, CSV files generated, and visualization can be done in Excel or similar tool.
  • Uber jar and run from command line was not implemented, because it will take additional time, and not clear from requirements, do they expected or not.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages