import seedcase_sprout.core as sp
import os
import pathlib
# For pretty printing of output
from pprint import pprint
"SPROUT_GLOBAL"] = ".storage/" os.environ[
Creating and managing data packages
🚧 Sprout is still in active development and evolving quickly, so the documentation and functionality may not work as described and could undergo substantial changes 🚧
At the core of Sprout is the , which is a standardized way of structuring and sharing data. This guide will show you how to create and manage data packages using Sprout.
Whether you’re using the Python library or the CLI, Sprout assumes you have full control over the system’s folders and files, or at least your user’s home directory. This includes being given space on a server that is primarily accessed through a terminal, where you have control over the directories you can write to.
For example, you might install Sprout on your own computer because you want to create data packages for your research studies. Alternatively, your research group might have storage space on a server that requires the use of a terminal and SSH to access.
Creating a data package
The first thing you’ll need to decide is where you want to store your data packages. By default, Sprout will create it in ~/sprout/packages/
on Linux (see Outputs for operating system specific locations), but you can change this by setting the SPROUT_GLOBAL
environment variable. For instance, maybe you want the location to be ~/Desktop/sprout/
or ~/Documents
. For our example, we will store it in our current working directory in the hidden folder .storage/
.
Afterwards, you can create the structure for your first data package by using:
= sp.create_package_structure(path=sp.path_packages())
new_package pprint(new_package)
[PosixPath('.storage/packages/1/datapackage.json'),
PosixPath('.storage/packages/1/README.md'),
PosixPath('.storage/packages/1/resources')]
This creates the initial structure of your new package with the ID 1
. The output above shows that the folder of your data package 1
has been created. This folder consists of two files: datapackage.json
and README.md
. The datapackage.json
file initially contains fields with some default values in them, but it will eventually contain the metadata, a.k.a. the properties
, of your data package. README.md
is a prettified, human-readable version of the content of the datapackage.json
.
While you can manually fill in the details in the datapackage.json
file, we have several helper classes, such as PackageProperties
, LicenseProperties
, and ContributorProperties
, to make it easier for you.
= sp.PackageProperties(
properties ="diabetes-hypertension-study",
name="Diabetes and Hypertension Study",
title="Data from the 2021 study on diabetes and hypertension",
description=[
contributors
sp.ContributorProperties(="Jamie Jones",
title="jamie_jones@example.com",
email="example.com/jamie_jones",
path=["creator"],
roles
)
],=[
licenses
sp.LicenseProperties(="ODC-BY-1.0",
name="https://opendatacommons.org/licenses/by",
path="Open Data Commons Attribution License 1.0",
title
)
],
) pprint(properties)
PackageProperties(name='diabetes-hypertension-study',
id=None,
title='Diabetes and Hypertension Study',
description='Data from the 2021 study on diabetes and '
'hypertension',
homepage=None,
version=None,
created=None,
contributors=[ContributorProperties(title='Jamie Jones',
path='example.com/jamie_jones',
email='jamie_jones@example.com',
given_name=None,
family_name=None,
organization=None,
roles=['creator'])],
keywords=None,
image=None,
licenses=[LicenseProperties(name='ODC-BY-1.0',
path='https://opendatacommons.org/licenses/by',
title='Open Data Commons '
'Attribution License 1.0')],
resources=None,
sources=None)
Then, to update the current datapackage.json
file with these properties, you can use the update_package_properties()
function:
= sp.edit_package_properties(
package_properties =sp.path_package_properties(package_id=1),
path=properties.compact_dict
properties
) pprint(package_properties)
{'contributors': [{'email': 'jamie_jones@example.com',
'path': 'example.com/jamie_jones',
'roles': ['creator'],
'title': 'Jamie Jones'}],
'created': '2025-01-10T12:50:46+00:00',
'description': 'Data from the 2021 study on diabetes and hypertension',
'homepage': '',
'id': '5017c543-7052-4d23-bc56-a4032298d671',
'image': '',
'keywords': [],
'licenses': [{'name': 'ODC-BY-1.0',
'path': 'https://opendatacommons.org/licenses/by',
'title': 'Open Data Commons Attribution License 1.0'}],
'name': 'diabetes-hypertension-study',
'resources': [],
'sources': [],
'title': 'Diabetes and Hypertension Study',
'version': '0.1.0'}
The edit_package_properties()
function will give an error if the required fields are not filled in to create a valid datapackage.json
file.
To save the package properties to the datapackage.json
file, run:
sp.write_package_properties(=package_properties,
properties=sp.path_properties(package_id=1)
path )
If you need help with filling in the right properties, see the documentation for the PackageProperties
classes or run e.g., print(sp.PackageProperties())
to get a list of all the fields you can fill in for a package.
You now have the basic starting point for adding data resources to your data package.
The CLI is a bit more straightforward as long as you are comfortable using the Terminal. You can set the SPROUT_GLOBAL
environment variable to change the location of the data packages. For instance, maybe you want the location to be ~/Documents/data-packages/
. You can set this in your terminal like so:
export SPROUT_GLOBAL=~/Documents/data-packages/
Then creating a new package would be as simple as:
sprout package create
This will prompt you for some required fields you need to fill in, like the title and description of the data package. If you want to skip the prompt, you can provide the information directly in the command:
sprout package create \
--title "Diabetes and Hypertension Study" \
--description "Data from the 2021 study on diabetes and hypertension"
This creates the initial structure of your new package with the ID 1
. The output above shows that the folder of your data package 1
has been created. This folder consists of two files: datapackage.json
and README.md
. The datapackage.json
file is empty initially, but it will contain the metadata, a.k.a. the properties
, of your data package. README.md
is a prettified, human-readable version of the content of the datapackage.json
.
You now have the basic starting point for adding data resources to your data package in the location you specified:
~/Documents/data-packages/1/datapackage.json
~/Documents/data-packages/1/README.md
In development.