Simulate different performance calculation methodologies #5827

durch · 2025-06-05T07:56:26Z

This change is

Implement foundation for simulated reward calculations to compare old (24h cache-based) vs new (1h route-based) methodologies without blockchain transactions. Database Changes: - Add migration with 4 new tables for simulation system - simulated_reward_epochs: tracks each simulation run - simulated_node_performance: stores performance calculations - simulated_rewards: stores reward calculation results - simulated_route_analysis: metadata for route analysis - Add comprehensive indexes for efficient querying Configuration Changes: - Add simulation_mode flag to rewarding configuration - Add CLI flag --simulate-rewarding with proper dependencies/conflicts - Add validation for simulation-specific settings - Add time window configuration for new method (default: 1 hour) Storage Layer: - Add model structs with SQLx FromRow derives for all simulation tables - Add comprehensive CRUD methods for simulation data management - Add proper type annotations to fix SQLx compile issues - Maintain separation between simulation and real rewarding logic The simulation mode is mutually exclusive with real rewarding and does not require mnemonic since no blockchain transactions are performed.

Add complete simulation engine that compares old (24h cache-based) vs new (1h route-based) reward calculation methodologies with full integration into epoch operations. Core Simulation Engine: - Add SimulationCoordinator with configurable time windows and comparison settings - Implement dual calculation methods with proper Performance type conversions - Add comprehensive error handling with DatabaseError variant in RewardingError - Store simulation results in database with proper relationship constraints Old Method Implementation (24h Cache-Based): - Wrap existing reliability calculation using get_all_avg_mix_reliability_in_last_24hr() - Convert reliability percentages to Performance types using from_percentage_value() - Maintain exact same logic as production for accurate baseline comparison - Generate simulation data structures with proper metadata New Method Implementation (1h Route-Based): - Leverage calculate_corrected_node_reliabilities_for_interval() for route analysis - Support configurable time windows (default 1 hour vs 24 hours) - Provide detailed route statistics including success rates and failure analysis - Convert route reliability data to Performance types with naive_try_from_f64() Epoch Operations Integration: - Extend EpochAdvancer struct with optional SimulationConfig field - Update constructor and start method to accept simulation configuration - Add simulation trigger in perform_epoch_operations() before real rewarding - Ensure simulation failures don't break epoch advancement process CLI Integration: - Update run.rs to handle both --enable-rewarding and --simulate-rewarding modes - Create SimulationConfig from rewarding.debug configuration settings - Implement mutual exclusivity between real rewarding and simulation mode - Skip permission checks for simulation-only mode (no blockchain transactions) The simulation system runs in parallel with epoch operations, storing comparative data for analysis without affecting production reward distribution.

This completes Phase 3 of the simulation system implementation: - Add comprehensive REST API endpoints for simulation data access - Implement /v1/simulation/* routes with full CRUD operations - Support JSON/CSV export for external analysis - Add statistical comparison between old vs new methods - Provide node performance history tracking - Include proper error handling and response formatting - Simplify simulation coordinator to remove unused complex return types - Clean up dead code while maintaining all functionality - Pass clippy with no warnings The simulation API provides complete access to: - Simulation epoch listing and details - Method comparison analytics (old 24h vs new 1h) - Node performance analysis across epochs - Route reliability statistics - Export capabilities for further analysis All simulation data is persisted and accessible via REST endpoints.

Refactors the simulation system to focus on performance methodology comparison rather than reward amounts, enabling robust analysis of old vs new calculation methods. Key improvements: - Replace simulated_rewards table with performance_comparisons for better metrics - Add performance_rankings table for ranking analysis across methodologies - Enhance database schema with additional performance tracking fields - Update simulation coordinator to use performance-focused data structures - Add comprehensive performance ranking calculations - Improve API models and handlers for performance comparison workflows - Update SQLx query cache with new database schema changes This provides a foundation for data-driven performance methodology evaluation while maintaining separation from actual reward calculations.

vercel · 2025-06-05T07:56:45Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
nym-explorer-v2	❌ Failed (Inspect)			Jun 25, 2025 1:06pm

2 Skipped Deployments

Name	Status	Preview	Comments	Updated (UTC)
docs-nextra	⬜️ Ignored (Inspect)	Visit Preview		Jun 25, 2025 1:06pm
nym-next-explorer	⬜️ Ignored (Inspect)	Visit Preview		Jun 25, 2025 1:06pm

…ystem This commit addresses critical N+1 query performance issues identified in the reward simulation system. The optimizations significantly reduce database round trips and improve performance when processing large datasets. **Key Optimizations:** 1. **Batch Identity Key Lookups** - Added `get_mixnode_identity_keys_batch()` and `get_gateway_identity_keys_batch()` - Updated simulation performance conversion to use batch operations - Reduced from N individual queries to 2 batch queries 2. **Batch Node Classification** - Added `classify_nodes_batch()` method for mixnode/gateway determination - Updated reliability calculation methods to use batch classification - Reduced from N individual lookups to 2 batch queries 3. **Batch Epoch Metadata Enhancement** - Added `count_simulated_node_performance_for_epochs_batch()` - Added `get_available_calculation_methods_for_epochs_batch()` - Updated API handlers to use batch operations for metadata enhancement - Reduced from 2N queries to 2 batch queries for epoch data 4. **Bulk Insert Optimizations** - Converted individual INSERT operations to use `sqlx::QueryBuilder::push_values()` - Optimized simulation data insertion methods - Eliminated transaction overhead from individual inserts **Performance Impact:** - Before: N+2N database queries for N nodes/epochs - After: 2+2 batch queries regardless of dataset size - Significant performance improvement for large simulation datasets All changes maintain backward compatibility while providing substantial performance benefits for the reward simulation system.

Implements automatic cleanup of route monitoring results to prevent unbounded storage growth while maintaining data for performance analysis. - Add purge_old_routes() method to StorageManager following existing patterns - Integrate route cleanup into purge_old_statuses() wrapper function - Route data now purged every epoch with 48-hour retention, to facilitate comparisons with legacy data - Update logging to reflect cleanup of both node statuses and routes

Replace infinite busy-wait loop with timeout-based client cleanup to prevent potential system hangs during client lifecycle management.

Implements proper socket-level health checking for gateway websocket connections: - Add is_connection_alive() method to GatewayTransceiver trait - Implement socket-level health check using TcpStream::peek() on Unix systems - Add is_gateway_connection_alive() method to MixnetClient for easy access - Update network monitor to use connection health checks before sending - Fix axum router state configuration in network monitor The health check performs actual OS-level socket validation rather than just checking file descriptor existence, providing reliable connection status.

…ing epoch In simulation mode, the nym-api should be able to run performance simulations regardless of which instance is handling epoch advancement, since simulations don't perform any blockchain transactions. This fix: - Allows simulation mode to proceed when another API is advancing the epoch - Skips epoch transition attempts in simulation mode - Skips actual reward distribution blockchain operations in simulation mode

…mulation The old method simulation was incorrectly treating reliability values (0-1 fractions) as percentages, causing all performance scores to be truncated to 0 when cast to u64. Changes: - Old method: multiply reliability by 100 before passing to Performance::from_percentage_value() - New method: remove division by 100 since Performance::naive_try_from_f64() expects fractions - Route analysis: multiply average reliability by 100 for percentage display This ensures consistent handling where internal calculations use fractions (0-1) while storage and API responses use percentages (0-100).

* Check gateway supported versions * Fix boxed errors * Fmt * feat(client-core): integrate gateway protocol validation into SDK Enable protocol version checking for all SDK-based clients by using gateways_for_init_with_protocol_validation in MixnetClientBuilder. This ensures clients only connect to gateways with compatible protocol versions, preventing potential communication issues. - Replace gateways_for_init with gateways_for_init_with_protocol_validation - Add import for the new validation function - Protocol validation now active for network monitor and all SDK users * refactor(client-core): allow gateways with newer protocol versions Change protocol validation to be more permissive - instead of rejecting gateways with newer protocol versions, now logs a warning and continues. This enables graceful degradation when gateways upgrade, relying on their backward compatibility while signaling users to update their clients. Changes: - Accept gateways with protocol version > client version - Log warning about version mismatch suggesting client update - Update log messages from "validation" to "check" for clarity - Add trace logging showing both gateway and client versions This prevents connectivity issues when gateways upgrade before clients, improving overall network resilience. * feat(nym-network-monitor): add diagnostic logging for ConnectionRefused debugging - Add detailed logging for client rotation lifecycle with [CLIENT_ROTATION_*] tags - Add request tracking with unique IDs using the random message as identifier - Add HTTP connection acceptance logging with [HTTP_REQUEST] tags - Improve locust script with connection error handling and backoff strategy - Add TCP backlog configuration support (default 1024) - Add socket configuration with SO_REUSEADDR and configurable backlog - Add HTTP timeout and tracing layers for better observability These changes help diagnose when and why ConnectionRefused errors occur during load testing, particularly around client rotation periods. * fix(nym-api): calculate route average reliability as node mean instead of route-based - Changed route average reliability calculation to use mean of all node reliabilities - Added median calculation alongside mean for better statistical representation - Fixed null average reliability for old method which doesn't analyze routes - Rounded all float values to 2 decimal places in API responses for cleaner display - Store median values in analysis_parameters JSON field

…comparison - Added ReliabilityDistribution struct with 6 categories: excellent (>95%), very_good (90-95%), good (75-90%), moderate (50-75%), poor (25-50%), very_poor (<25%) - Updated ComparisonSummaryStats to include distribution breakdowns for both old and new methods - Provides better insight into performance distribution beyond averages - Added tests for distribution calculation

Since old method data is already available in production database, simulation now only calculates new method performance. The API combines production data with simulated data at query time. Key changes: - Remove old_method_simulation and run_both_methods config - Add production_performance field to NodePerformanceData model - Update API endpoints to fetch production data from node annotations cache - Refactor compare_methods to use single performance dataset - Fix route analysis comparison to handle missing old method data - Update all tests to use new single-dataset approach This simplifies the simulation code and ensures consistency with production calculations.

…ures - Remove old_method/new_method nested objects from NodeMethodComparison - Use direct fields for production_performance and simulated_performance - Update build_node_comparisons_from_single_dataset to create cleaner structure - Fix calculate_summary_statistics to work with new data model - Update all tests to match new structure The API now returns a single dataset with production performance values included directly, rather than separate old/new method objects. - Deduplicate routes, to control for deterministic routing

- Add create_or_get_simulated_reward_epoch to check for existing simulations - Skip simulation if already exists for the epoch and calculation method - Add tests to verify duplicate prevention works correctly - Update all test calls to use the new method This fixes the issue where multiple simulations were created for the same epoch when epoch operations were retried or the service restarted. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

durch added 6 commits June 3, 2025 12:39

Tests

b3d07e8

Locust sleep if no clients are available

a96fb09

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 5, 2025 07:56 Failure

durch added 2 commits June 5, 2025 10:45

Add timeout to locust

75f34ef

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 5, 2025 10:01 Failure

Client per request

eb1c7d6

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 5, 2025 10:11 Failure

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 6, 2025 07:26 Failure

Add RESET_CONFIG option to entrypoint

77c4acf

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 6, 2025 07:39 Failure

Fix config cleanup

74db9be

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 6, 2025 08:23 Failure

Dont use hickory dns for gateway client

e336e02

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 6, 2025 08:43 Failure

durch added 2 commits June 6, 2025 10:49

Respond with 504 to timeouts

71c24d8

Put client factory back

c389f43

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 6, 2025 08:52 Failure

Fix client drop loop hanging in network monitor

cab072f

Replace infinite busy-wait loop with timeout-based client cleanup to prevent potential system hangs during client lifecycle management.

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 6, 2025 09:03 Failure

durch marked this pull request as draft June 6, 2025 09:26

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 6, 2025 10:18 Failure

durch marked this pull request as ready for review June 9, 2025 09:28

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 24, 2025 15:49 Failure

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 24, 2025 17:09 Failure

durch and others added 4 commits June 24, 2025 21:28

Fmt

0e53562

Bump nym-api

6993ef0

Bump nnm

9cb18cd

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 24, 2025 19:41 Failure

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 24, 2025 19:58 Failure

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 24, 2025 21:48 Failure

durch added 2 commits June 25, 2025 00:40

FMT, fix typo

1364316

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 24, 2025 22:50 Failure

Fix simulated epoch generation

c5c6432

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 25, 2025 06:35 Failure

durch and others added 2 commits June 25, 2025 10:50

Filter only on tested nodes

e1a0235

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 25, 2025 09:04 Failure

Remove OR ignore

688660a

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 25, 2025 09:56 Failure

Dedup routes before submission, stagger out batches

94cc0db

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 25, 2025 10:06 Failure

Fix off by 100x

1665b1f

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 25, 2025 10:53 Failure

Disable delay and retransmission for network-monitor

f6d49d2

vercel bot had a problem deploying to Preview – nym-explorer-v2 June 25, 2025 13:06 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simulate different performance calculation methodologies #5827

Simulate different performance calculation methodologies #5827

Uh oh!

durch commented Jun 5, 2025 •

edited by mmsinclair

Loading

Uh oh!

vercel bot commented Jun 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Simulate different performance calculation methodologies #5827

Are you sure you want to change the base?

Simulate different performance calculation methodologies #5827

Uh oh!

Conversation

durch commented Jun 5, 2025 • edited by mmsinclair Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel bot commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

durch commented Jun 5, 2025 •

edited by mmsinclair

Loading

vercel bot commented Jun 5, 2025 •

edited

Loading