Creating and managing data packages
🚧 This section is still in active development and is subject to changes 🚧
At the core of Sprout is the data package, which is a standardized way of structuring and sharing data. This guide will show you how to create and manage data packages using Sprout.
For both the Python library and the CLI, Sprout assumes you have full control over the folders and files of the system, or at least your user’s home directory. This includes being given space on a server that mostly has access through a Terminal, where you have control over the directories you can write to.
An easy example of this is if you install Sprout on your own computer because you want to create some data packages for research studies you are running. Another example would be if your research group has storage space on a server that you need to use a Terminal and SSH to be able to access.
Creating a data package
The first thing you’ll need to decide is where you want to store your data packages. By default, Sprout will create it in ~/sprout/packages/
on Linux (see Outputs for operating system specific locations), but you can change this by setting the SPROUT_ROOT
environment variable. For instance, maybe you want the location to be ~/Documents/data-packages/
. You can set this in your Python script like so:
import sprout.core as sp
import os
"SPROUT_ROOT"] = "~/Documents/data-packages/" os.environ[
Afterwards, you can create the structure for your first data package by using:
=sp.path_packages()) sp.create_package_structure(path
[PosixPath('~/Documents/data-packages/1/datapackage.json'),
PosixPath('~/Documents/data-packages/1/README.md')]
This creates the initial structure of your new package with the ID 1
. The output above shows that the folder of your data package 1
has been created. This folder consists of two files: datapackage.json
and README.md
. The datapackage.json
file initially contains fields with some default values in them, but it will eventually contain the metadata, a.k.a. the properties
, of your data package. README.md
is a prettified, human-readable version of the content of the datapackage.json
.
While you can manually fill in the details in the datapackage.json
file, we have several helper classes, such as PackageProperties
, LicenseProperties
, and ContributorProperties
, to make it easier for you.
= sp.PackageProperties(
properties ="Diabetes and Hypertension Study",
title="Data from the 2021 study on diabetes and hypertension",
description=[sp.ContributorProperties(
contributors="Jamie Jones",
title="jamie_jones@example.com",
email=["creator"]
roles
)],=[sp.LicensesProperties("ODC-BY-1.0")]
licenses
)print(properties)
# TODO: This will eventually show the actual output.
PackageProperties(...)
Then, to update the current datapackage.json
file with these properties, you can use the update_package_properties()
function:
= sp.update_package_properties(
package_properties =sp.path_properties(package_id=1),
path=properties
properties
)print(package_properties)
# TODO: add an example output of the above.
{...}
The update_package_properties()
function will give an error if the required fields are not filled in to create a valid datapackage.json
file.
To save the package properties to the datapackage.json
file, run:
sp.write_package_properties(=package_properties,
properties=sp.path_properties(package_id=1)
path )
If you need help with filling in the right properties, see the documentation for the PackageProperties
classes or run e.g., print(sp.PackageProperties())
to get a list of all the fields you can fill in for a package.
You now have the basic starting point for adding data resources to your data package.
The CLI is a bit more straightforward as long as you are comfortable using the Terminal. You can set the SPROUT_ROOT
environment variable to change the location of the data packages. For instance, maybe you want the location to be ~/Documents/data-packages/
. You can set this in your terminal like so:
export SPROUT_ROOT=~/Documents/data-packages/
Then creating a new package would be as simple as:
sprout package create
This will prompt you for some required fields you need to fill in, like the title and description of the data package. If you want to skip the prompt, you can provide the information directly in the command:
sprout package create \
--title "Diabetes and Hypertension Study" \
--description "Data from the 2021 study on diabetes and hypertension"
This creates the initial structure of your new package with the ID 1
. The output above shows that the folder of your data package 1
has been created. This folder consists of two files: datapackage.json
and README.md
. The datapackage.json
file is empty initially, but it will contain the metadata, a.k.a. the properties
, of your data package. README.md
is a prettified, human-readable version of the content of the datapackage.json
.
You now have the basic starting point for adding data resources to your data package in the location you specified:
~/Documents/data-packages/1/datapackage.json
~/Documents/data-packages/1/README.md
In development.