π diabetes-study/
ββπ .gitignore
ββπ .python-version
ββπ README.md
ββπ main.py
ββπ pyproject.toml
Creating and managing properties in a data package
π§ Sprout is still in active development and evolving quickly, so the documentation and functionality may not work as described and could undergo substantial changes π§
At the core of Sprout is the βdata packageβ, which is a standardized way of structuring and documenting data. This guide will show you how to create and manage data packages using Sprout.
Sprout assumes you have control over your systemβs files and folders, or at least your userβs home directory. This includes having access to a server through the Terminal where you can write to specific folders.
Creating a data package
Sprout is designed assuming the working directory is the root of your Python projectβwhere your .git/
folder and/or pyproject.toml
file are located.
Right now, your file structure should look a bit like:
Now, time to make a data package! A data package always needs a datapackage.json
file in the root of the project. It contains a set of properties, or metadata about the data package and the data resources within it. Right now, you donβt have a datapackage.json
file yet. Before you can create one, you need to write out the properties for the data package.
While you could manually create a datapackage.json
file and then add the properties manually to it yourself, this would require that you know the exact fields and values to fill in. To ease this process, Sprout provides a way to create the properties using Python scripts. So letβs create this script first, by adding it as a step in the main.py
file!
The main.py
file is where you will write the code to create and manage your data package, so that you can easily recreate your data package and its resources. You can think of it as a pipeline that takes you from beginning to end of creating your data package. Open the main.py
file in your Python project, delete everything in it and copy and paste the below code into it.
import seedcase_sprout as sp
def main():
# Create the properties script in the default location.
sp.create_properties_script()
if __name__ == "__main__":
main()
Then, run this command in the Terminal:
Terminal
uv run main.py
This will create a properties.py
file in the newly created scripts/
folder of your data package.
Because of the way Python scripts and importing works, you should also create an __init__.py
file in the scripts/
folder. You can do this by running the following command in your Terminal:
Terminal
touch scripts/__init__.py
The file structure should now look like:
π diabetes-study/
ββπ scripts/
β ββπ __init__.py
β ββπ properties.py
ββπ .gitignore
ββπ .python-version
ββπ README.md
ββπ main.py
ββπ pyproject.toml
Inside the scripts/properties.py
file, you will find a template for creating the properties of your data package. It looks like:
import seedcase_sprout as sp
# from .resource_properties import resource_properties
properties = sp.PackageProperties(
## Required:
name="diabetes-study",
title="",
description="",
licenses=[
sp.LicenseProperties(
## Required:
name="",
## Optional:
# path="",
# title="",
),
],
## Optional:
# homepage="",
# contributors=[
# sp.ContributorProperties(
# ## Required:
# title="",
# ## Optional:
# path="",
# email="",
# given_name="",
# family_name="",
# organization="",
# roles=[""],
# ),
# ],
# keywords=[""],
# image="",
# sources=[
# sp.SourceProperties(
# ## Required:
# title="",
# ## Optional:
# path="",
# email="",
# version="",
# ),
# ],
# resources=[
# resource_properties,
# ],
## Autogenerated:
id="757dae38-55bf-4f99-b33d-961018db26a7",
version="0.1.0",
created="2025-06-25T14:07:27+00:00",
)
You can now start filling in that script by using the comments included, such as making sure to fill in the required fields, as well as using the documentation for the PackageProperties
to know what to write. The core of a data package is its properties. Without these, your data are simply a collection of files without any context or meaning. The metadata (properties) are crucially important for understanding and actually using the data in your data package!
To help with writing the properties, Sprout includes several properties data classes, such as PackageProperties
, LicenseProperties
, and ContributorProperties
, to make it easier for you to make properties with the correct fields filled in. See the PackageProperties
documentation for more details about these data classes.
For now, letβs write some basic properties for your data package. Below is an example of a set of properties with required fields filled in (including the optional but highly recommended contributors
field).
= sp.PackageProperties(
properties ="diabetes-study",
name="A Study on Diabetes",
title# You can write Markdown below, with the helper `sp.dedent()`.
=sp.dedent("""
description # Data from a 2021 study on diabetes prevalence
This data package contains data from a study conducted in 2021 on the
*prevalence* of diabetes in various populations. The data includes:
- demographic information
- health metrics
- survey responses about lifestyle
"""),
=[
contributors
sp.ContributorProperties(="Jamie Jones",
title="jamie_jones@example.com",
email="example.com/jamie_jones",
path=["creator"],
roles
)
],=[
licenses
sp.LicenseProperties(="ODC-BY-1.0",
name="https://opendatacommons.org/licenses/by",
path="Open Data Commons Attribution License 1.0",
title
)
],# We don't include the rest of the properties script in this guide. The above is only to
# show what it might look to write properties in the script.
)
Now that youβve filled in some of the package properties, itβs time to create your datapackage.json
file with these properties. You can use the write_properties()
function for this. But in order to use it effectively, itβs best included within the main.py
script within the main()
function, so that it can be run as part of your build pipeline. So in your main.py
, include this code:
import seedcase_sprout as sp
from scripts.properties import properties
def main():
# Create the properties script in default location.
sp.create_properties_script()# Write properties from properties script to `datapackage.json`.
=properties)
sp.write_properties(properties
if __name__ == "__main__":
main()
Then, run this command in the Terminal:
Terminal
uv run main.py
The write_properties()
function will give an error if the PackageProperties
object is missing some of its required fields or if they are not filled in correctly. In that case, a datapackage.json
file wonβt be created. So you will have to return to the scripts/properties.py
file and fill in the correct properties.
The write_properties()
function created the datapackage.json
file in your data packageβs diabetes-study
folder, which contains the properties you added to it. Now, you will see the added datapackage.json
file in your data package folder.
π diabetes-study/
ββπ scripts/
β ββπ __init__.py
β ββπ properties.py
ββπ .gitignore
ββπ .python-version
ββπ README.md
ββπ datapackage.json
ββπ main.py
ββπ pyproject.toml
Creating a README of the properties
Having a human-readable version of what is contained in the datapackage.json
file is useful for others who may be working with or wanting to learn more about your data package. You can use as_readme_text()
to convert the properties into text that can be added to a README file. Letβs create a README file with the properties of the data package you just created by writing it in the main.py
file.
import seedcase_sprout as sp
from scripts.properties import properties
def main():
# Create the properties script in default location.
sp.create_properties_script()# Save the properties to `datapackage.json`.
=properties)
sp.write_properties(properties# Create text for a README of the data package.
= sp.as_readme_text(properties)
readme_text # Write the README text to a `README.md` file.
sp.write_file(readme_text, sp.PackagePath().readme())
if __name__ == "__main__":
main()
Sprout splits the README creation functionality into two steps: One to make the text and one to write to the file. That way, if you want to add or manipulate the text, you can do so before writing it to the file. This is useful if you want to add information to the README that you donβt want included in the datapackage.json
file. For this guide we wonβt cover how or why to do this.
Next, run this command in the Terminal to make the README file. The write_file()
will always overwrite the existing README file.
Terminal
uv run main.py
Now you can see that the README.md
file has been created in your data package:
π diabetes-study/
ββπ scripts/
β ββπ __init__.py
β ββπ properties.py
ββπ .gitignore
ββπ .python-version
ββπ README.md
ββπ datapackage.json
ββπ main.py
ββπ pyproject.toml
Editing package properties
If you made a mistake and want to update the properties in the current datapackage.json
, you can edit the Python script directly where you previously made the properties. Since everything is written in Python scripts, updating those scripts and re-running your build pipeline (main.py
) will then update everything.
If you need help with filling in the right properties, see the documentation for the PackageProperties
classes or run e.g., print(sp.PackageProperties())
to get a list of all the fields you can fill in for a package.
You now have the basic starting point for adding data resources to your data package.