Design
🚧 Sprout is still in active development and evolving quickly, so the documentation and functionality may not work as described and could undergo substantial changes 🚧
The core aim of Sprout is to take data and metadata and convert them into a standardized and organized storage structure that follows best practices for data engineering, particularly with a focus on research contexts. Specifically, Sprout aims to:
- Take generated data from various source locations (such as clinics or laboratories), which may be distributed geographically or organizationally, and store it in a standardized and efficient format.
- Ensure that metadata is included for the data and organized in a standardized format, explicitly and programmatically linking the metadata to the data.
The purpose of these documents is to describe the design of Sprout in enough detail to help us develop it in a way that is sustainable (i.e., maintainable over the long term) and that ensures we as a team have a shared understanding of what Sprout is and is not.
Sprout’s design builds off of our overall Seedcase design principles and patterns, in that any design we create for Sprout should strongly follow the same design principles and patterns described there. Unless explicitly stated otherwise, Sprout’s design adds to but does not replace the overall design.
There are two main sections of this design documentation:
- Architecture: The architecture section describes the high-level requirements, components, use cases, expected users, and how the system will be organized and structured.
- Interface: The interface section gives a much more detailed description of the software interface for the architecture. The detail is at the level of exact Python functions and CLI commands, as well as their arguments, inputs, and outputs.