Parrot is a C++ library for fused array operations using CUDA/Thrust. It provides efficient GPU-accelerated operations with lazy evaluation semantics, allowing for chaining of operations without unnecessary intermediate materializations.
- Fused Operations - Operations are fused when possible
- GPU Acceleration - Built on CUDA/Thrust for high performance
- Chainable API - Clean, expressive syntax for complex operations
- Header-Only - Simple integration with
#include "parrot.hpp"
¹ Lazy-ish means that any operation that can be lazily evaluated is lazily evaluated.
#include "parrot.hpp"
int main() {
// Create arrays
auto A = parrot::array({3, 4, 0, 8, 2});
auto B = parrot::array({6, 7, 2, 1, 8});
auto C = parrot::array({2, 5, 7, 4, 3});
// Chain operations
(B * C + A).print(); // Output: 15 39 14 12 26
}
#include "parrot.hpp"
using namespace parrot::literals;
auto softmax(auto matrix) {
auto cols = matrix.shape()[1];
auto z = matrix - matrix.maxr(2_ic).replicate(cols);
auto num = z.exp();
auto den = num.sum(2_ic);
return num / den.replicate(cols);
}
int main() {
auto matrix = parrot::range(6).as<float>().reshape({2, 3});
softmax(matrix).print();
}
- CMake (3.10+)
- C++ compiler with C++20 support
- NVIDIA CUDA Toolkit 13.0 or later
- NVIDIA GPU with compute capability 7.0+
# Clone the repository
git clone <repository-url>
cd parrot
# Create build directory
mkdir build && cd build
# Configure and build
cmake ..
cmake --build . -j$(nproc)
# Run tests
ctest
For detailed build instructions, see BUILDING.md
.
Parrot provides significant code reduction compared to other CUDA libraries:
Library | Code Reduction |
---|---|
Thrust | ~10x less code |
See detailed comparisons in our documentation.
- Full Documentation - Complete API reference and examples
- Examples - Standalone Parrot examples
- vs Thrust - Side-by-side comparisons with Thrust
The examples/
directory contains:
getting_started/
- Simple examples to get startedthrust/
- Parrot implementations of Thrust examples
# All tests
ctest
# Individual test categories
./test_basic # Basic operations
./test_sorting # Sorting algorithms
./test_math # Mathematical operations
./test_reductions # Reduction operations
# Run clang-tidy
./scripts/run-clang-tidy.sh
# Auto-fix issues
./scripts/run-clang-tidy.sh --fix
# Install dependencies
pip install -r requirements.txt
# Build docs
cd script && ./build-docs.sh
parrot/
├── parrot.hpp # Main header (single-file library)
├── thrustx.hpp # Extended Thrust utilities
├── examples/ # Example code
│ ├── getting_started/ # Simple getting-started examples
│ ├── thrust/ # Parrot versions of Thrust examples
│ ├── real_world/ # Parrot versions of examples from open source projects
├── tests/ # Unit tests
├── docs/ # Documentation source
├── scripts/ # Development scripts
We welcome contributions! Please see our CONTRIBUTING.md guide for details on:
- Developer Certificate of Origin (DCO) requirements
- Setting up the development environment
- Code standards and review process
- Running tests and code quality checks
- Building documentation
All contributions must be signed off under the DCO and comply with the Apache License 2.0.
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
This project includes third-party software components. See THIRD_PARTY_LICENSES for complete license information and attributions.
Built on top of NVIDIA Thrust and CUDA.