import seedcase_sprout as sp
Creating and managing data packages
đźš§ Sprout is still in active development and evolving quickly, so the documentation and functionality may not work as described and could undergo substantial changes đźš§
At the core of Sprout is the , which is a standardized way of structuring and documenting data. This guide will show you how to create and manage data packages using Sprout.
Sprout assumes you have control over your system’s files and folders, or at least your user’s home directory. This includes having access to a server through the Terminal where you can write to specific folders.
Creating a data package
Sprout is designed to work in the root of your Python project—where your .git/
folder and/or pyproject.toml
file is located (see the installation guide for how to install Sprout in a virtual environment). By default, Sprout’s helper functions expect the working directory to be the root of your project. Throughout this guide we will refer to the root folder of the Python project as the root folder of the data package.
With that in mind, let’s make a data package! A data package always needs a datapackage.json
file. This file will be located at the root of your data package and contains a set of , or metadata about the data package and the data resources within it. To set up this datapackage.json
file, you first need a set of properties you want to add to the data package. Then, you can use the write_package_properties()
function that takes the properties and the path where you want to store the data package as arguments. So first, you need to establish our properties.
Sprout includes several data classes, such as PackageProperties
, LicenseProperties
, and ContributorProperties
, to make it easier for you to make properties with the correct fields filled in. See the guide on properties for more information about these data classes.
First, import the necessary modules and set up the environment:
There are a few properties that are required for a data package, such as version
, id
, and created
. While you could fill these in manually, it’s much easier to use the from_default()
method of the PackageProperties
class:
= sp.PackageProperties.from_default(
properties ="diabetes-hypertension-study",
name="Diabetes and Hypertension Study",
title="Data from the 2021 study on diabetes and hypertension",
description=[
contributors
sp.ContributorProperties(="Jamie Jones",
title="jamie_jones@example.com",
email="example.com/jamie_jones",
path=["creator"],
roles
)
],=[
licenses
sp.LicenseProperties(="ODC-BY-1.0",
name="https://opendatacommons.org/licenses/by",
path="Open Data Commons Attribution License 1.0",
title
)
],
)print(properties)
PackageProperties(name='diabetes-hypertension-study', id='da608079-19cc-4054-b904-28e207cb2a53', title='Diabetes and Hypertension Study', description='Data from the 2021 study on diabetes and hypertension', homepage=None, version='0.1.0', created='2025-05-30T10:53:26+00:00', contributors=[ContributorProperties(title='Jamie Jones', path='example.com/jamie_jones', email='jamie_jones@example.com', given_name=None, family_name=None, organization=None, roles=['creator'])], keywords=None, image=None, licenses=[LicenseProperties(name='ODC-BY-1.0', path='https://opendatacommons.org/licenses/by', title='Open Data Commons Attribution License 1.0')], resources=None, sources=None)
Now, time to create your data package with these properties.
For this guide, you will create this data package in a temporary folder. In a real project, you would create the data package in the root of your project. You do not need to do this below code. Yours may use something like pathlib.Path().cwd()
to get the current working directory of your Python or Git project.
from tempfile import TemporaryDirectory
from pathlib import Path
= TemporaryDirectory()
temp_path = Path(temp_path.name) / "diabetes-study"
package_path
# Create the path to the package
=True) package_path.mkdir(parents
sp.write_package_properties(=properties,
properties=sp.PackagePath(package_path).properties()
path )
PosixPath('/tmp/tmpxzso9pp0/diabetes-study/datapackage.json')
The write_package_properties()
function will give an error if the PackageProperties
object is missing some of its required fields or if they are not filled in correctly. In that case, a datapackage.json
file won’t be created.
This creates the initial structure of your new package. The write_package_properties()
function created the datapackage.json
file in your data package diabetes-study
folder, which contains the properties you added to it. The newly created file would be:
<generator object Path.glob at 0x7ff83050f670>
Creating a README of the properties
Having a human-readable version of what is contained in the datapackage.json
file is useful for others who may be working with or wanting to learn more about your data package. You can use as_readme_text()
to convert the properties into text that can be added to a README file. Let’s create a README file with the properties of the data package you just created:
= sp.as_readme_text(properties)
readme_text print(readme_text)
# Diabetes and Hypertension Study
Data from the 2021 study on diabetes and hypertension
| Field | Value |
|----------|-----------------------------------------|
| Name | `diabetes-hypertension-study` |
| ID | `da608079-19cc-4054-b904-28e207cb2a53` |
| Version | `0.1.0` |
| Homepage | N/A |
| Created | 30 May 2025, 10:53 |
| Licenses | ODC-BY-1.0 |
No resources available.
It outputs a text string so that, if you want to, you can add other, maybe project specific, information before writing to the file. For this guide we won’t cover how or why to do this.
Then to save it to the file:
sp.write_file(=readme_text,
string=Path(package_path / "README.md")
path )
PosixPath('/tmp/tmpxzso9pp0/diabetes-study/README.md')
Editing package properties
If you made a mistake and want to update the properties in the current datapackage.json
, you can edit the Python script directly where you previously made the properties. Since everything is written in Python scripts, updating those scripts and re-running your build pipeline will then update everything again.
If you need help with filling in the right properties, see the documentation for the PackageProperties
classes or run e.g., print(sp.PackageProperties())
to get a list of all the fields you can fill in for a package.
You now have the basic starting point for adding data resources to your data package.