Requirements
Warning
🚧 Sprout is still in active development and evolving quickly, so the documentation and functionality may not work as described and could undergo substantial changes 🚧
Sprout must:
- Run on Windows, MacOS, and Linux (likely on servers): Our potential users work on any of these systems, so we need to ensure compatibility across most commonly used operating systems.
- Integrate GDPR, privacy, and security compliance: Our target users work with health data, so this is vital to consider
- Run remotely on servers and locally on computers: The location where data are stored should be flexible based on the needs and restrictions of the user.
- Be able to handle a variety of data file sizes: While the size of research data does not compare to those found in industry, it can still become large enough that it requires special care and handling.
- Store data in a format that is open source, integrates with many tools, and is storage efficient: Sprout is first and foremost a data engineering tool for research data storage and distribution (or at least, easier sharing).
- Store, organize, and manage multiple distinct data sources per user or group of users (for example in a server setting): Researchers rarely collect and work on one data source at any given time. So, Sprout must be able to handle multiple distinct data sources.
- Upload and update data: Data can be added to Sprout could happen in batches or on a more frequent basis. We anticipate that batch uploads will be the most common.
- Store, organize, and manage metadata connected to the data: Metadata are vital to understanding the data and its context, without which data can be near useless. Sprout needs to make managing and organizing metadata fundamental to its functionality.
- Track changes to the data in a changelog and versioning system: Data are not static and can change over time. Sprout must track these changes and provide a way to show, track, and manage versions of the data. This is also necessary for legal compliance for auditing and record-keeping.
Sprout will not:
- Run any analytic computing or data science work: While some data processing and analysis will occur, it will be limited to running checks on the quality of the data and metadata.