Inputs
🚧 Sprout is still in active development and evolving quickly, so the documentation and functionality may not work as described and could undergo substantial changes 🚧
In this document we describe the inputs to Sprout at a technical level. For a more general description of what domain we expect data to come from and how it may look like conceptually, see the input data document.
The objects we primarily incorporate in the interface are paths, properties, and data, as described in the naming document.
- Paths
-
These objects are always
Path
objects from Python’spathlib
. - Properties
-
These are Python dataclasses we’ve created to represent the properties (metadata) of a data package or resource. They always use the
*Properties
suffix, such asPackageProperties
andResourceProperties
. - Data
-
These data input objects are always Polars
DataFrame
objects. -
Sprout expects the data objects to be in a tidy format. Tidy data is a conceptual framework around how data should be structured to be optimally usable for analysis. This is similar to what is called third normal form in relational database development, excluding the emphasis on creating additional tables to reduce redundancy.
-
Tidy data has the following properties:
- Each variable forms a column.
- Each observation forms a row.
- Each cell has a single value that represents both the column and row entity.
This tidy structure makes it easier to process and use the data for later analysis. If it is already in this form, there is less need to transform it later.
-
We have set a requirement on using tidy data frames because it simplifies the workflow and allows users to use their own, potentially highly customised, workflows for processing their data. This also allows for a more flexible and user-friendly interface to Sprout, since it removes the need to support or consider all the wide variety of formats data can be stored or structured in. Because data processing and organizing can be so strongly dependent on the domain, we leave this to the user to handle.