10 releases (5 breaking)
Uses new Rust 2024
| new 0.6.0 | Dec 24, 2025 |
|---|---|
| 0.5.3 | Dec 22, 2025 |
| 0.4.1 | Dec 17, 2025 |
| 0.3.0 | Dec 2, 2025 |
| 0.1.0 | Nov 16, 2025 |
#53 in Biology
Used in fastlowess
310KB
4.5K
SLoC
lowess
A high-performance implementation of LOWESS (Locally Weighted Scatterplot Smoothing) in Rust. This crate provides a robust, production-ready implementation with support for confidence intervals, multiple kernel functions, and optimized execution modes.
[!IMPORTANT] For parallelization or
ndarraysupport, usefastLowess.
Features
- Robust Statistics: IRLS with Bisquare, Huber, or Talwar weighting for outlier handling.
- Uncertainty Quantification: Point-wise standard errors, confidence intervals, and prediction intervals.
- Optimized Performance: Delta optimization for skipping dense regions and streaming/online modes for large or real-time datasets.
- Parameter Selection: Built-in cross-validation for automatic smoothing fraction selection.
- Flexibility: Multiple weight kernels (Tricube, Epanechnikov, etc.) and
no_stdsupport (requiresalloc). - Validated: Numerical agreement with R's
stats::lowessand Python'sstatsmodels.
Robustness Advantages
This implementation is more robust than statsmodels due to two key design choices:
MAD-Based Scale Estimation
For robustness weight calculations, this crate uses Median Absolute Deviation (MAD) for scale estimation:
s = median(|r_i - median(r)|)
In contrast, statsmodels uses median of absolute residuals:
s = median(|r_i|)
Why MAD is more robust:
- MAD is a breakdown-point-optimal estimator—it remains valid even when up to 50% of data are outliers.
- The median-centering step removes asymmetric bias from residual distributions.
- MAD provides consistent outlier detection regardless of whether residuals are centered around zero.
Boundary Padding
This crate applies boundary policies (Extend, Reflect, Zero) at dataset edges:
- Extend: Repeats edge values to maintain local neighborhood size.
- Reflect: Mirrors data symmetrically around boundaries.
- Zero: Pads with zeros (useful for signal processing).
statsmodels does not apply boundary padding, which can lead to:
- Biased estimates near boundaries due to asymmetric local neighborhoods.
- Increased variance at the edges of the smoothed curve.
Gaussian Consistency Factor
For interval estimation (confidence/prediction), residual scale is computed using:
sigma = 1.4826 * MAD
The factor 1.4826 = 1/Phi^-1(3/4) ensures consistency with the standard deviation under Gaussian assumptions.
Performance Advantages
Benchmarked against Python's statsmodels. Achieves 113-2813× faster performance across all tested scenarios, with no regressions. Performance gains scale dramatically with dataset size.
Summary
| Category | Matched | Median Speedup | Mean Speedup |
|---|---|---|---|
| Scalability | 5 | 481× | 1057× |
| Financial | 4 | 270× | 301× |
| Iterations | 6 | 238× | 248× |
| Pathological | 4 | 234× | 220× |
| Scientific | 4 | 212× | 239× |
| Fraction | 6 | 218× | 268× |
| Genomic | 4 | 6.9× | 10.4× |
| Delta | 4 | 5.0× | 5.0× |
Top 10 Performance Wins
| Benchmark | statsmodels | Rust | Speedup |
|---|---|---|---|
| scale_100000 | 43.7s | 15.5ms | 2813× |
| scale_50000 | 11.2s | 7.6ms | 1466× |
| fraction_0.05 | 197.2ms | 0.38ms | 516× |
| financial_10000 | 497.1ms | 0.97ms | 512× |
| scale_10000 | 663.1ms | 1.38ms | 481× |
| scientific_10000 | 777.2ms | 1.86ms | 418× |
| financial_5000 | 170.9ms | 0.49ms | 346× |
| fraction_0.1 | 227.9ms | 0.67ms | 339× |
| scale_5000 | 229.9ms | 0.69ms | 334× |
| iterations_0 | 74.2ms | 0.26ms | 289× |
Check Benchmarks for detailed results and reproducible benchmarking code.
Installation
Add this to your Cargo.toml:
[dependencies]
lowess = "0.6"
For no_std environments:
[dependencies]
lowess = { version = "0.6", default-features = false }
Quick Start
use lowess::prelude::*;
fn main() -> Result<(), LowessError> {
let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.0, 4.1, 5.9, 8.2, 9.8];
// Basic smoothing
let result = Lowess::new()
.fraction(0.5)
.adapter(Batch)
.build()?
.fit(&x, &y)?;
println!("Smoothed values: {:?}", result.y);
Ok(())
}
Builder Methods
use lowess::prelude::*;
Lowess::new()
// Smoothing span (0, 1]
.fraction(0.5)
// Robustness iterations
.iterations(3)
// Interpolation threshold
.delta(0.01)
// Kernel selection
.weight_function(Tricube)
// Robustness method
.robustness_method(Bisquare)
// Zero-weight fallback behavior
.zero_weight_fallback(UseLocalMean)
// Boundary handling (for edge effects)
.boundary_policy(Extend)
// Confidence intervals
.confidence_intervals(0.95)
// Prediction intervals
.prediction_intervals(0.95)
// Diagnostics
.return_diagnostics()
.return_residuals()
.return_robustness_weights()
// Cross-validation (for parameter selection)
.cross_validate(KFold(5, &[0.3, 0.5, 0.7]).seed(123))
// Convergence
.auto_converge(1e-4)
// Execution mode
.adapter(Batch)
// Build the model
.build()?;
Result Structure
pub struct LowessResult<T> {
// Sorted x values
pub x: Vec<T>,
// Smoothed y values
pub y: Vec<T>,
// Point-wise standard errors
pub standard_errors: Option<Vec<T>>,
// Confidence intervals
pub confidence_lower: Option<Vec<T>>,
pub confidence_upper: Option<Vec<T>>,
// Prediction intervals
pub prediction_lower: Option<Vec<T>>,
pub prediction_upper: Option<Vec<T>>,
// Residuals
pub residuals: Option<Vec<T>>,
// Final IRLS weights
pub robustness_weights: Option<Vec<T>>,
// Diagnostics
pub diagnostics: Option<Diagnostics<T>>,
// Actual iterations used
pub iterations_used: Option<usize>,
// Selected fraction
pub fraction_used: T,
// CV RMSE per fraction
pub cv_scores: Option<Vec<T>>,
}
Streaming Processing
For datasets that don't fit in memory:
let mut processor = Lowess::new()
.fraction(0.3)
.iterations(2)
.adapter(Streaming)
.chunk_size(1000)
.overlap(100)
.build()?;
// Process data in chunks
for chunk in data_chunks {
let result = processor.process_chunk(&chunk.x, &chunk.y)?;
}
// Finalize processing
let final_result = processor.finalize()?;
Online Processing
For real-time data streams:
let mut processor = Lowess::new()
.fraction(0.2)
.iterations(1)
.adapter(Online)
.window_capacity(100)
.build()?;
// Process points as they arrive
for (x, y) in data_stream {
if let Some(output) = processor.add_point(x, y)? {
println!("Smoothed: {}", output.smoothed);
}
}
Parameter Selection Guide
Fraction (Smoothing Span)
- 0.1-0.3: Local, captures rapid changes (wiggly)
- 0.4-0.6: Balanced, general-purpose
- 0.7-1.0: Global, smooth trends only
- Default: 0.67 (2/3, Cleveland's choice)
- Use CV when uncertain
Robustness Method
- Bisquare (default): Best all-around, smooth, efficient
- Huber: Theoretically optimal MSE
Robustness Iterations
- 0: Clean data, speed critical
- 1-2: Light contamination
- 3: Default, good balance (recommended)
- 4-5: Heavy outliers
- >5: Diminishing returns
Kernel Function
- Tricube (default): Best all-around, smooth, efficient
- Epanechnikov: Theoretically optimal MSE
- Gaussian: Very smooth, no compact support
- Uniform: Fastest, least smooth (moving average)
Delta Optimization
- None: Small datasets (n < 1000)
- 0.01 × range(x): Good starting point for dense data
- Manual tuning: Adjust based on data density
Examples
Check the examples directory for more complex scenarios:
cargo run --example batch_smoothing
cargo run --example online_smoothing
cargo run --example streaming_smoothing
MSRV
Rust 1.85.0 or later (2024 Edition).
Validation
Validated against:
- Python (statsmodels): Passed on 44 distinct test scenarios.
- Original Paper: Reproduces Cleveland (1979) results.
Check Validation for more information. Small variations in results are expected due to differences in scale estimation and padding.
Related Work
Contributing
Contributions are welcome! Please see the CONTRIBUTING.md file for more information.
License
Dual-licensed under AGPL-3.0 (Open Source) or Commercial License.
Contact <thisisamirv@gmail.com> for commercial inquiries.
References
- Cleveland, W.S. (1979). "Robust Locally Weighted Regression and Smoothing Scatterplots". JASA.
- Cleveland, W.S. (1981). "LOWESS: A Program for Smoothing Scatterplots". The American Statistician.
Dependencies
~700KB
~14K SLoC