PONCHO, is a lightweight Python based toolkit which allows users to synthesize environments from a concise, human-readable JSON file containing the necessary information required to build a self-contained Conda virtual environment needed to execute scientific applications on distributed systems. Poncho is composed of three parts: poncho_package_analyze, poncho_package_create and poncho_package_run
poncho_package_analyze performs a static analysis of dependencies used within a python application. The output is JSON file listing the dependencies.
poncho_package_analyze application.py spec.json
This will give you a dependency file like this:{
"conda":{
"channels":[
"defaults",
"conda-forge"
],
"packages":[
"ndcctools=7.3.0",
"parsl=1.1.0",
]
},
"pip": [
"topcoffea"
]
}
{"git": {
"DATA_DIR": {
"remote": "http://.../repo.git"
}
},
"http": {
"REFERENCE_DB": {
"type": "file",
"url": "https://.../example.dat"
}
}
}
poncho_package_create allows users to create an environment from a JSON specification file. This specification may include Conda packages, Pip packages, remote Git repos and arbitrary files accessible via HTTPS. This environment is then packaged into a tarball.
poncho_package_create spec.json env.tar.gz
poncho_package_run will unpack and and activate the an environment. As an input, a command will then be executed within this environment. Any Git repos or files specified within the environment will be set as environment variables.
poncho_package_run -e env.tar.gz python application.py
This programmable interface allows us to now take a Python application and easily move it from place to place within a cluster, and is in production with the Coffea data analysis application and the Parsl workflow system when using Work Queue as an execution system.
The poncho tools can be found in the latest release of the Cooperative Computing Tools.
No comments:
Post a Comment