Menu
Configuring workflows
Consistently run or share your data science project with others using workflows. Configure your workflow once and run it manually until you’re ready to automate it by setting a schedule for your workflow.
Basic options
- Workspace
- Choose a workspace for this workflow to use.
- Workspaces can be shared with multiple people, so be careful to select the correct one for your new workflow to prevent sharing sensitive data with the wrong teammates.
- The workspace can not be changed after creating workflows.
- Instance Size
- Choose the amount of virtual memory for your workflow. Read more about selecting instance types.
- Choose either cpu or gpu based instances.
- Max Run Time
- Set the max amount of time that your workflow should run before automatically being terminated.
Advanced options
- Notebook to run
- Set the location of the notebook you want to run when this workflow starts.
- Notebooks are ran in the background of your workflow instance.
- On completion of the notebook run, a log file of your notebook run will be stored under the /notebooks/workflow-runs/ in your workspace.
- Python packages
- Specify your python packages similar to a requirements.txt file (pip install).
- Separate multiple python packages with new lines.
- Environment variables
- Save configuration variables and secrets as environment variables for simpler and more secure management of important settings.
- Environment variables are stored encrypted until your workflow is ran.
- Call environment variables from Python usingimport os and then os.environ[‘VARIABLE-NAME’].
In [2]:
import os
os.environ['API-KEY']
Out[2]:
'Th15i$53cret'
- Bash commands
- Run Ubuntu Linux commands at the start of every workflow.
- Automate advanced workflows or install missing programs using apt-get install package like commands.
Set a schedule
- Starts on
- Select date and time that you want your this workflow to first run at.
- Runs every
- Set the delay in hours and minutes in-between your scheduled workflow runs.
- Workflows scheduled runs are accurate to about a minute and therefore can experience shifts overtime resulting in the workflow starting at different minute intervals over time.