Skip to content

robnewman/stress-test-datasets-staging

Repository files navigation

Seqera Platform Stress Test Automation

A comprehensive Python script for automated stress testing and resource management on Seqera Platform. This tool creates and manages large-scale deployments of organizations, workspaces, compute environments, datasets, and nf-core workflows using the seqerakit library.

📋 Table of Contents

🎯 Overview

This script automates the creation and management of Seqera Platform resources at scale for stress testing, benchmarking, or rapid environment provisioning. It leverages the seqerakit Python library (a YAML-based wrapper around the Seqera Platform CLI, tw-cli) and direct connection to Seqera Platform API endpoints to create a complete infrastructure hierarchy.

What It Does

  • Creates 50 organizations (configurable) with randomized names prefixed with stress-test-org_
  • Provisions 5 workspaces per organization (configurable) prefixed with stress-test-wksp_
  • Upgrades organizations to 'pro' tier via direct API calls
  • Sets up AWS Cloud compute environments for each workspace
  • Downloads and uploads test datasets from nf-core repositories, duplicating the uploads with random names to stress test
  • Deploys 20 workflows per workspace (configurable) from 20 popular nf-core pipelines
  • Complete teardown functionality for clean resource removal
  • State persistence for reliable tracking and cleanup

Default Scale

  • 50 organizations
  • 250 workspaces (5 per org)
  • 250 AWS Cloud compute environments (1 per workspace)
  • 1000 datasets (4 per workspace on average)
  • 5,000 workflows (20 per workspace)

✨ Features

Core Capabilities

  • Batch Resource Creation: Efficiently creates resources in optimized batches
  • Randomized Naming: Generates unique names for all resources to avoid conflicts
  • Direct API Integration: Uses Seqera Platform API for admin operations
  • Dataset Management: Downloads, uploads, and cleans up test data automatically
  • State Tracking: JSON-based state persistence for reliable cleanup
  • Comprehensive Logging: Detailed logging for debugging and monitoring
  • Error Handling: Robust error handling with graceful degradation
  • Rate Limiting: Built-in delays to avoid API throttling

Supported nf-core Workflows

The script includes test datasets for the following pipelines:

  1. rnaseq (3.12.0) - RNA sequencing analysis
  2. chipseq (2.0.0) - ChIP-seq analysis
  3. atacseq (2.1.2) - ATAC-seq analysis
  4. scrnaseq (2.5.1) - Single-cell RNA-seq
  5. sarek (3.4.0) - Variant calling pipeline
  6. mag (2.5.1) - Metagenome assembly
  7. viralrecon (2.6.0) - Viral reconstruction
  8. ampliseq (2.8.0) - Amplicon sequencing
  9. methylseq (4.1.0) - Methylation (Bisulfite) sequencing
  10. smrnaseq (2.3.1) - Small RNA-Seq analysis
  11. nanoseq (3.1.0) - Nanopore DNA/RNA sequencing data
  12. cutandrun (3.2.2) - CUT&RUN, CUT&Tag, and TIPseq experiments
  13. rnavar (1.0.0) - GATK4 RNA variant calling
  14. isoseq (2.0.0) - Genome annotation using PacBio Iso-Seq
  15. bactmap (1.0.0) - Mapping bacterial genome sequences to create a phylogeny
  16. hic (2.1.0) - Analyzing Chromosome Conformation Capture (Hi-C) data
  17. circrna (dev) - Analyze total RNA sequencing data
  18. taxprofiler (1.1.8) - Taxonomic classification and profiling of shotgun short- and long-read metagenomic data
  19. crisprseq (2.2.1) - Analyzing CRISPR edited data
  20. funcscan (1.2.0) - Screening of nucleotide sequences

🏗️ Architecture

How It Works

  1. YAML Configuration Generation: Script dynamically creates YAML configurations for each resource type
  2. seqerakit Processing: Configurations are processed by seqerakit which translates them to Seqera Platform CLI (tw) commands
  3. API Execution: Commands are executed against the Seqera Platform API
  4. State Management: All created resources are tracked in JSON for reliable cleanup

Technology Stack

  • Python 3.8+: Core scripting language
  • seqerakit: YAML-based wrapper for Seqera Platform CLI
  • Seqera Platform CLI (tw): Official command-line interface
  • python-dotenv: Secure environment variable management
  • requests: HTTP library for direct API calls
  • PyYAML: YAML configuration handling

📦 Prerequisites

Required Software

  1. Python 3.8 or later

    python --version
  2. Seqera Platform CLI (tw)

    # Linux/macOS
    curl -fsSL https://github.com/seqeralabs/tower-cli/releases/download/v0.9.2/tw-linux-x86_64 -o tw
    chmod +x tw
    sudo mv tw /usr/local/bin/
    
    # Verify installation
    tw --version
  3. seqerakit Python library

Using pip:

pip install seqerakit

Using conda:

conda install bioconda::seqerakit

Access Requirements

  • Active Seqera Platform account
  • Admin-level access token (for organization management)
  • Appropriate permissions to create resources
  • (Optional) AWS credentials if using AWS Batch compute environments

🚀 Installation

Step 1: Clone or Download the Script

Save the following files to your working directory:

  • seqera_manager.py - Main script
  • .env.template - Environment variable template
  • requirements.txt - (Optional) Python dependencies for Pip
  • environment.yaml - (Optional) Python dependencies for Conda

Step 2: Install Python Dependencies

Using pip:

pip install -r requirements.txt

Using conda:

conda env update --file environment.yaml

requirements.txt contents:

python-dotenv==1.0.0
seqerakit>=0.5.0
pyyaml>=6.0.0
requests>=2.31.0

environment.yaml contents:

name: staging-stress-test-py
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - brotli-python=1.1.0=py313hb4b7877_4
  - bzip2=1.0.8=hd037594_8
  - ca-certificates=2025.10.5=hbd8a1cb_0
  - certifi=2025.10.5=pyhd8ed1ab_0
  - cffi=1.17.1=py313h755b2b2_1
  - charset-normalizer=3.4.3=pyhd8ed1ab_0
  - h2=4.3.0=pyhcf101f3_0
  - hpack=4.1.0=pyhd8ed1ab_0
  - hyperframe=6.1.0=pyhd8ed1ab_0
  - icu=75.1=hfee45f7_0
  - idna=3.10=pyhd8ed1ab_1
  - libcxx=21.1.1=hf598326_0
  - libexpat=2.7.1=hec049ff_0
  - libffi=3.4.6=h1da3d7d_1
  - liblzma=5.8.1=h39f12f2_2
  - libmpdec=4.0.0=h5505292_0
  - libsqlite=3.50.4=h4237e3c_0
  - libzlib=1.3.1=h8359307_2
  - ncurses=6.5=h5e97a16_3
  - openjdk=24.0.2=had54fb3_0
  - openssl=3.5.4=h5503f6c_0
  - pip=25.2=pyh145f28c_0
  - pycparser=2.22=pyh29332c3_1
  - pysocks=1.7.1=pyha55dd90_7
  - python=3.13.7=h5c937ed_100_cp313
  - python-dotenv=1.1.1=pyhe01879c_0
  - python_abi=3.13=8_cp313
  - pyyaml=6.0.2=py313ha9b7d5b_2
  - readline=8.2=h1d1bf99_2
  - requests=2.32.5=pyhd8ed1ab_0
  - seqerakit=0.5.6=pyhdfd78af_0
  - tk=8.6.13=h892fb3f_2
  - tower-cli=0.15.0=hdfd78af_0
  - tzdata=2025b=h78e105d_0
  - urllib3=2.5.0=pyhd8ed1ab_0
  - yaml=0.2.5=h925e9cb_3
  - zstandard=0.24.0=py313hff09f02_1
  - zstd=1.5.7=h6491c7d_2
prefix: /opt/homebrew/Caskroom/miniconda/base/envs/staging-stress-test-py

Step 3: Install Seqera Platform CLI

Follow the official installation guide at https://docs.seqera.io/platform/latest/cli/installation

Step 4: Verify Installation

# Check seqerakit
seqerakit --version

# Check Seqera Platform CLI
tw --version

# Check Python dependencies
python -c "import seqerakit, requests, yaml, dotenv; print('All dependencies installed')"

⚙️ Configuration

Environment Variables

Create a .env file in the same directory as the script:

# Copy the template
cp .env.template .env

# Edit with your values
nano .env

.env file contents:

# Seqera Platform Configuration

# Your Seqera Platform access token (REQUIRED)
# Create at: https://cloud.seqera.io -> Your Profile -> Your Tokens
TOWER_ACCESS_TOKEN=your_access_token_here

# Seqera Platform API endpoint (REQUIRED)
TOWER_API_ENDPOINT=https://api.cloud.seqera.io

# Alternative endpoints:
# EU region: https://api.eu1.cloud.seqera.io
# Enterprise/On-premise: https://your-seqera-instance.com/api

Getting Your Access Token

  1. Log in to Seqera Platform
  2. Navigate to Your ProfileYour Tokens
  3. Click Add Token
  4. Give it a name (e.g., "Stress Test Script")
  5. Copy the token immediately (it's only shown once)
  6. Paste into your .env file

Security Best Practices

  • Never commit .env to version control
  • ✅ Add .env to your .gitignore file
  • ✅ Use environment-specific tokens
  • ✅ Regularly rotate access tokens
  • ✅ Use read-only tokens when possible for summary operations

📚 Usage

Basic Commands

1. Setup (Create All Resources)

# Use default settings (50 orgs, 5 workspaces, 20 workflows)
python seqera_manager.py --action setup

# Custom resource counts
python seqera_manager.py --action setup \
    --organizations 10 \
    --workspaces-per-org 3 \
    --workflows-per-workspace 15

# With custom state file
python seqera_manager.py --action setup --state-file my_test.json

2. Summary (View Created Resources)

# View summary from default state file
python seqera_manager.py --action summary

# View summary from custom state file
python seqera_manager.py --action summary --state-file my_test.json

3. Teardown (Delete All Resources)

# Delete all resources using default state file
python seqera_manager.py --action teardown

# Delete using custom state file
python seqera_manager.py --action teardown --state-file my_test.json

Command-Line Arguments

Argument Description Default Example
--action Action to perform: setup, teardown, or summary setup --action setup
--state-file JSON file for state persistence seqera_state.json --state-file test.json
--organizations Number of organizations to create 50 --organizations 10
--workspaces-per-org Workspaces per organization 5 --workspaces-per-org 3
--workflows-per-workspace Workflows per workspace 20 --workflows-per-workspace 15

Example Workflows

Small Test Deployment

# Create 5 organizations with 2 workspaces each, 10 workflows per workspace
python seqera_manager.py --action setup \
    --organizations 5 \
    --workspaces-per-org 2 \
    --workflows-per-workspace 10 \
    --state-file small_test.json

# View what was created
python seqera_manager.py --action summary --state-file small_test.json

# Clean up
python seqera_manager.py --action teardown --state-file small_test.json

Large-Scale Stress Test

# Create 100 organizations with 10 workspaces each, 30 workflows per workspace
# Total: 100 orgs, 1000 workspaces, 30,000 workflows
python seqera_manager.py --action setup \
    --organizations 100 \
    --workspaces-per-org 10 \
    --workflows-per-workspace 30 \
    --state-file stress_test.json

🗂️ Resource Management

Resource Hierarchy

Organizations (stress-test-org_*)
├── Workspaces (stress-test-wksp_*)
    ├── AWS Cloud Compute Environments (stress-test-ce-*)
    ├── Datasets (*-ds_*)
    └── Pipelines/Workflows (*-wf_*)

Naming Conventions

All resources follow a consistent naming pattern:

  • Organizations: stress-test-org_<random8chars>

    • Example: stress-test-org_a7b3c9d2
  • Workspaces: stress-test-wksp_<random8chars>

    • Example: stress-test-wksp_x9y8z7w6
  • Compute Environments: stress-test-ce-<random6chars>

    • Example: stress-test-ce-f4g5h6
  • Datasets: <workflow>_test_<type>-ds_<random4chars>

    • Example: rnaseq_test_samplesheet-ds_k2l3
  • Workflows: <workflow>-wf_<random4chars>

    • Example: rnaseq-wf_m4n5

State File Structure

The script saves all created resources to a JSON state file:

{
  "organizations": [
    {
      "name": "stress-test-org_a7b3c9d2",
      "org_id": "12345",
      "full_name": "Stress Test Organization...",
      "description": "Auto-generated stress test organization 1"
    }
  ],
  "workspaces": [...],
  "datasets": [...],
  "workflows": [...],
  "compute_environments": [...],
  "timestamp": 1234567890.123,
  "summary": {
    "total_organizations": 50,
    "total_workspaces": 250,
    "total_workflows": 5000,
    "total_datasets": 750,
    "total_compute_environments": 250
  }
}

🔄 Workflow

Setup Process Flow

graph TD
    A[Start Setup] --> B[Load Environment Variables]
    B --> C[Create Organizations]
    C --> D[Update Orgs to 'pro' Type]
    D --> E[For Each Organization]
    E --> F[Create Workspaces]
    F --> G[For Each Workspace]
    G --> H[Create AWS Cloud Compute Environment]
    H --> I[Download & Upload Datasets]
    I --> J[Add Workflows]
    J --> K{More Workspaces?}
    K -->|Yes| G
    K -->|No| L{More Organizations?}
    L -->|Yes| E
    L -->|No| M[Save State]
    M --> N[Print Summary]
    N --> O[End]
Loading

Detailed Operation Sequence

  1. Initialization

    • Load environment variables from .env
    • Verify seqerakit and CLI installation
    • Initialize resource tracking lists
  2. Organization Creation

    • Generate unique org names with stress-test-org_ prefix
    • Create organizations via seqerakit
    • Retrieve organization IDs from API
    • Update organizations to 'pro' tier via direct API call
  3. Workspace Provisioning

    • Generate unique workspace names with stress-test-wksp_ prefix
    • Create workspaces within each organization
    • Batch process for efficiency
  4. Compute Environment Setup

    • Create AWS Cloud compute environment per workspace
    • Wait for compute environment to be AVAILABLE
    • Essential for workflow execution
  5. Dataset Management

    • Download test datasets from nf-core repositories
    • Save to temporary local files
    • Upload to Seqera Platform via seqerakit
    • Delete local temporary files
  6. Workflow Deployment

    • Add nf-core pipelines to each workspace
    • Cycle through 8 different workflow types
    • Configure with test profiles
    • Process in batches of 5 for rate limiting
  7. State Persistence

    • Save all resource information to JSON
    • Include timestamps and summary statistics

Teardown Process Flow

  1. Load State - Read resource information from state file
  2. Generate YAML - Create comprehensive YAML with all resources
  3. Execute Deletion - Run seqerakit --delete command
  4. Clean Up - Remove state file after successful deletion

Seqerakit handles deletion in the correct dependency order:

  • Workflows → Datasets → Compute Environments → Workspaces → Organizations

🔧 Troubleshooting

Common Issues and Solutions

Issue: "seqerakit: command not found"

Solution:

# Install seqerakit
pip install seqerakit

# Verify installation
seqerakit --version

Issue: "tw: command not found"

Solution:

# Install Seqera Platform CLI
curl -fsSL https://github.com/seqeralabs/tower-cli/releases/download/v0.9.2/tw-linux-x86_64 -o tw
chmod +x tw
sudo mv tw /usr/local/bin/

# Verify
tw --version

Issue: "TOWER_ACCESS_TOKEN must be set"

Solution:

  1. Verify .env file exists in the same directory as script
  2. Check that TOWER_ACCESS_TOKEN is set in .env
  3. Ensure no extra spaces around the = sign
  4. Verify the token is still valid in Seqera Platform

Issue: "Failed to create organization"

Possible Causes:

  • Invalid or expired access token
  • Insufficient permissions (need admin access)
  • Organization name conflicts
  • API endpoint misconfigured

Solution:

# Test API connection manually
curl -H "Authorization: Bearer YOUR_TOKEN" https://api.cloud.seqera.io/user-info

# Check logs for specific error messages
python seqera_manager.py --action setup 2>&1 | tee setup.log

Issue: "Failed to download dataset"

Possible Causes:

  • Network connectivity issues
  • nf-core test datasets URL changed
  • Timeout (default 30 seconds)

Solution:

  • Check internet connectivity
  • Verify URLs are accessible: curl -I <dataset-url>
  • Increase timeout in create_datasets() method if needed

Issue: "Compute environment creation failed"

Possible Causes:

  • Missing AWS credentials
  • Insufficient AWS permissions
  • VPC/subnet configuration issues
  • Region not supported

Solution:

  • Verify AWS credentials are configured
  • Check AWS IAM permissions for Batch
  • Customize compute environment config for your setup
  • Review seqerakit logs: seqerakit -l debug <config.yml>

Issue: Rate limiting / API throttling

Symptoms:

  • HTTP 429 errors
  • Timeouts
  • Slow execution

Solution:

# Increase delays in the script:
# In create_organizations, create_workspaces, etc.:
time.sleep(2.0)  # Increase from 0.5 or 1.0

# Reduce batch sizes:
batch_size = 3  # Reduce from 5 in add_workflows

Debug Mode

Enable detailed logging:

# In the script, change logging level:
logging.basicConfig(
    level=logging.DEBUG,  # Change from INFO to DEBUG
    format='%(asctime)s - %(levelname)s - %(message)s'
)

Testing Connection

Test your configuration before running full setup:

# Test with minimal resources
python seqera_manager.py --action setup \
    --organizations 1 \
    --workspaces-per-org 1 \
    --workflows-per-workspace 1 \
    --state-file test_connection.json

# If successful, clean up
python seqera_manager.py --action teardown --state-file test_connection.json

💡 Best Practices

Before Running

  1. Test with Small Numbers First

    # Start small to verify everything works
    python seqera_manager.py --action setup --organizations 2 --workspaces-per-org 1
  2. Verify Environment Variables

    # Check .env is loaded correctly
    python -c "from dotenv import load_dotenv; import os; load_dotenv(); print('Token present:', bool(os.getenv('TOWER_ACCESS_TOKEN')))"
  3. Check Available Resources

    • Verify sufficient quota/limits in Seqera Platform
    • Check AWS service limits if using AWS Batch
    • Ensure adequate API rate limits

During Execution

  1. Monitor Progress

    # Run with output redirection
    python seqera_manager.py --action setup 2>&1 | tee execution.log
  2. Don't Interrupt

    • Let the script complete or resources may be orphaned
    • Use state file for recovery if interrupted
  3. Resource Monitoring

    • Monitor Seqera Platform web UI
    • Watch for errors in logs
    • Check system resources (memory, disk for datasets)

After Execution

  1. Verify Creation

    # Check the summary
    python seqera_manager.py --action summary
  2. Backup State File

    # Keep a copy of the state file
    cp seqera_state.json seqera_state_backup_$(date +%Y%m%d).json
  3. Clean Up When Done

    # Always teardown test resources
    python seqera_manager.py --action teardown

Performance Optimization

  1. Adjust Batch Sizes

    • Increase batch sizes for faster execution (if API allows)
    • Decrease if encountering rate limits
  2. Parallel Execution

    • Script is sequential by design for reliability
    • Consider creating multiple state files for parallel runs
    • Use different organization prefixes to avoid conflicts
  3. Network Optimization

    • Run from location with good connectivity to API endpoint
    • Consider running from same cloud region as Seqera Platform

Cost Considerations

  1. Compute Environments

    • AWS Cloud compute environments may incur AWS/cloud costs
    • Monitor cloud billing during stress tests
    • Delete resources promptly after testing
  2. Data Transfer

    • Dataset downloads consume bandwidth
    • Datasets are uploaded to Seqera Platform
    • Consider data transfer costs at scale

📖 API Reference

SeqeraManager Class

Main class for managing Seqera Platform resources.

Constructor

manager = SeqeraManager()

Initializes the manager, loads environment variables, and sets up resource tracking.

Raises:

  • ValueError: If TOWER_ACCESS_TOKEN is not set

Methods

create_organizations(count: int = 50) -> List[Dict[str, Any]]

Creates multiple organizations with randomized names.

Parameters:

  • count (int): Number of organizations to create

Returns:

  • List of created organization dictionaries

Example:

orgs = manager.create_organizations(10)
update_organizations_to_pro(organizations: List[Dict[str, Any]]) -> List[Dict[str, Any]]

Updates organizations to 'pro' tier using direct API calls.

Parameters:

  • organizations: List of organization dictionaries

Returns:

  • List of successfully updated organizations
create_workspaces(org_name: str, count: int = 5) -> List[Dict[str, Any]]

Creates workspaces within an organization.

Parameters:

  • org_name (str): Organization name
  • count (int): Number of workspaces to create

Returns:

  • List of created workspace dictionaries
create_compute_environment(workspace_name: str, org_name: str) -> Optional[Dict[str, Any]]

Creates a AWS Cloud compute environment for a workspace.

Parameters:

  • workspace_name (str): Workspace name
  • org_name (str): Organization name

Returns:

  • AWS Cloud compute environment dictionary or None if failed
create_datasets(workspace_name: str, org_name: str) -> List[Dict[str, Any]]

Downloads and uploads datasets to a workspace.

Parameters:

  • workspace_name (str): Workspace name
  • org_name (str): Organization name

Returns:

  • List of created dataset dictionaries
add_workflows(workspace_name: str, org_name: str, count: int = 20) -> List[Dict[str, Any]]

Adds nf-core workflows to a workspace.

Parameters:

  • workspace_name (str): Workspace name
  • org_name (str): Organization name
  • count (int): Number of workflows to add

Returns:

  • List of created workflow dictionaries
setup_all_resources(org_count: int = 50, workspace_count: int = 5, workflow_count: int = 20)

Complete setup process for all resources.

Parameters:

  • org_count (int): Number of organizations
  • workspace_count (int): Workspaces per organization
  • workflow_count (int): Workflows per workspace
teardown_all_resources()

Removes all tracked resources from Seqera Platform.

save_state(filename: str = 'seqera_state.json')

Saves current state to JSON file.

Parameters:

  • filename (str): Path to state file
load_state(filename: str = 'seqera_state.json')

Loads state from JSON file.

Parameters:

  • filename (str): Path to state file
print_summary()

Prints a summary of all created resources to console.

🤝 Contributing

Contributions are welcome! Here are ways you can help:

Reporting Issues

  • Use GitHub Issues for bug reports
  • Include script version and Python version
  • Provide full error messages and logs
  • Describe steps to reproduce

Feature Requests

  • Describe the use case
  • Explain expected behavior
  • Consider backwards compatibility

Code Contributions

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Update documentation
  6. Submit a pull request

Areas for Improvement

  • Additional cloud platform support (Azure, GCP)
  • More nf-core workflows
  • Parallel execution support
  • Web UI for monitoring
  • Docker containerization
  • CI/CD pipeline integration
  • Prometheus/Grafana metrics export

📄 License

This script is provided as-is for stress testing and automation purposes. Please review your Seqera Platform license and terms of service before running large-scale tests.

🔗 Resources

📞 Support

For issues specific to:

📊 Metrics and Monitoring

Execution Time Estimates

Based on typical API response times:

Scale Organizations Workspaces Workflows Estimated Time
Small 5 25 500 ~15-20 minutes
Medium 25 125 2,500 ~1-1.5 hours
Large 50 250 5,000 ~2-3 hours
XL 100 1,000 30,000 ~8-12 hours

Times include rate limiting delays and may vary based on API performance

Resource Consumption

Local System:

  • CPU: Minimal (mostly I/O bound)
  • Memory: ~100-500 MB
  • Disk: ~50-100 MB for temporary datasets
  • Network: ~1-5 GB total data transfer (for datasets)

Seqera Platform:

  • Organizations: Defined by subscription
  • Workspaces: Platform-dependent limits
  • AWS Cloud Compute Environments: May incur cloud costs
  • Storage: Datasets stored in platform

🎯 Use Cases

  1. Stress Testing: Validate platform performance under load
  2. Benchmarking: Compare execution times and throughput
  3. Training Environments: Quickly provision demo/training setups
  4. CI/CD Testing: Automated integration testing
  5. Capacity Planning: Understand resource requirements
  6. Migration Testing: Validate migration procedures

⚠️ Important Notes

  • This script creates real resources that may incur costs
  • Always run teardown after testing to avoid charges
  • Test with small numbers before large-scale runs
  • Monitor API rate limits to avoid throttling
  • Backup state files for reliable cleanup
  • Review permissions before granting admin access

Version: 1.0.0
Last Updated: 2025
Python Version: 3.8+
seqerakit Version: 0.5.0+

About

Stress test multiple datasets across organizations and workspaces and

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages