Skip to content
This repository was archived by the owner on Sep 11, 2023. It is now read-only.
This repository was archived by the owner on Sep 11, 2023. It is now read-only.

Discussion: For testing, should we use "fake" data or a small amount of real data? #512

@JackKelly

Description

@JackKelly

(Let's not worry about this now... just making a note to discuss in early 2022!)

As we all know, in order for "fake" data to be useful for testing, the "fake" data needs to accurately capture almost all of the structure of "real" data. Otherwise the "fake" data could drive us to reach incorrect conclusions when debugging and testing our code (as happened when debugging the OpticalFlowDatasource tests).

Creating really "realistic" fake data is probably quite a lot of effort (for example, see issue #511).

I suppose I'm curious whether it might actually be less work to use a small amount of real data for testing, instead of maintaining code to create "fake" data on the fly? And include this sample of real data in the nowcasting_dataset/tests/data/ folder?

Strictly speaking, we're not allowed to share some of our data sources. Maybe it wouldn't be too much work to obfuscate a small amount of "real" data (e.g. PV locations could be the LSOA locations that we're allowed to share publicly. And, for other data sources, we could add a small amount of random noise to all the data?)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions