Tech Reports
For reports prior to 2008 please visit http://apps.cs.utexas.edu/tech_reports
Displaying 1 - 50 of 592LLM Agents for Verified Programming
Honors Theses , 05/18/2025 , HR-25-27 ,
Large language models (LLMs) exhibit impressive coding abilities, solving a variety of coding tasks with nothing more than a natural language specification. However, a main limitation of using LLMs to write code is that the code may not be correct; LLM-generated code often contains bugs that can easily go unnoticed. In this work, we address this issue by making use of formal theorem proving, which allows properties of programs to be verified with high confidence. In order to achieve this, formal theorem proving requires the user to write machine-checkable proofs in a formal logical language, and writing these formal proofs is often difficult and time-consuming. Thus, we pose a new "verified coding” task in the formal theorem-proving language Lean 4, with two subtasks: a coding subtask and a proving subtask. Given a natural language problem statement and a formal specification of the desired program behavior, the coding subtask requires implementing a function that solves the problem and follows the desired behavior, while the proving subtask requires formally proving that the function implementation matches the specification. We present an approach that uses a series of LLM agents to tackle both the coding and proving subtasks of this task. We show that this approach improves over an LLM baseline.
Improving Sampling Methods in Automated Design Space Exploration for Deep Learning Accelerators
Honors Theses , 05/14/2025 , HR-25-26 ,
Deep learning models are becoming increasingly prevalent in computer science and daily life. They create
large computational demands which can be managed by the design and deployment of dedicated deep learning
accelerators. Because deep learning tasks are highly structured, there is considerable interest in using
automated design space exploration (DSE) to help develop these accelerators and their software. This thesis
explores the sampling techniques that are used in DSE for deep learning accelerators. We find that domain-aware
Bayesian Optimization (daBO) — a state-of-the-art optimization technique — requires high sample
counts because it frequently selects infeasible designs. We employ clustering in the design space, and we develop
cluster-based sampling techniques to improve the rate of sampling feasible designs. The integration of
one cluster-based sampler with daBO reduces the number of samples taken by ∼73% in the geometric mean
and up to ∼96%, while maintaining similar quality of the final design. Independently, we study the sampling
of loop permutations of accelerator software and conclude that existing techniques to sample permutations
statically are insufficient. We develop a framework to dynamically adjust the sample counts of loop permutations
and use it to opportunistically reduce sample counts by ∼25% in the geometric mean and up to ∼41%
with only small sacrifices in the quality of the resulting accelerator design. Finally, we demonstrate that
cluster-based sampling and dynamic permutation sampling can be used together and can provide compounded
benefits.
Investigating Improving Reinforcement Learning for Autonomous Cyber Defense Agents
Honors Theses , 05/14/2025 , HR-25-25 ,
Cybersecurity incident response is an increasingly complex task that requires humans to quickly analyze large amounts of data and make optimal decisions. One solution is to train autonomous cyber defense agents using reinforcement learning techniques to provide faster and more accurate responses. In this thesis, I investigate the effects of reward shaping and the use of human input during training. Specifically, I present various modified reward functions and their effects on the agent's behavior. I also explore kickstarting the agent's training with human input and demonstrate how this can influence the agent's performance. This thesis provides preliminary work in these directions.
Optimization Frameworks for Design Space Exploration of Deep Learning Accelerators
Honors Theses , 05/08/2025 , HR-25-23 ,
Deep learning accelerators are specialized processors designed to address the growing computational demands of artificial intelligence applications. Recent work has automated the exploration of their complex design space, which (1) speeds up the design process, and (2) enables the discovery of more optimal configurations. Optimization frameworks are tools that support this automated exploration. This thesis aims to advance the state-of-the-art in automated deep learning accelerator design. Through a comprehensive evaluation of existing optimization frameworks, we identify opportunities for improvement. Our findings demonstrate that the state-of-the-art framework daBO consistently outperforms alternative frameworks, achieving 2× to 8× better performance across various deep learning models when optimizing for energy-delay product. We also propose enhancements to daBO that improve its performance on modern DL models, particularly Transformers by 2×.
Reconstructing Speech from fMRI-Recorded Brain Activity
Honors Theses , 05/08/2025 , HR-25-21 ,
Brain decoding is the process of reconstructing stimuli by interpreting brain activity. In particular, given brain signals recorded using functional magnetic resonance imaging (fMRI), we want to determine the language that incited these brain responses. Previous research has successfully demonstrated reconstruction of word sequences from these fMRI recordings, but this does not capture the more refined aspects of speech, such as intonation, emotion, and intent. In this project, our aim is to improve the performance of brain decoding by reconstructing audio segments that recover the acoustic properties of perceived speech.
Neo – A Modern HDL for Building Hardware You Can Trust
Honors Theses , 05/08/2025 , HR-25-20 ,
Developing reusable hardware poses significant challenges, particularly concerning parameterization and timing correctness. Highly parameterized modules, necessary for reuse, often harbor subtle edge cases only triggered by specific parameter combinations. Furthermore, integrating components, especially those involving decoupled pipelines requiring stalls and flushes, is prone to timing errors. Current hardware design languages (HDLs) lack adequate mechanisms to catch these easy-to-make yet catastrophic mistakes. Neo is a new HDL featuring a type system that encodes timings and stalls, helping to enforce correctness in complex pipeline interactions. Paired with a symbolic engine that validates parameterized designs across input constraints, Neo gives developers confidence to build robust, reusable, and modular hardware.
Light LODing: varying illumination models through visual importance
Honors Theses , 05/08/2025 , HR-25-19 ,
Recent advances in hardware acceleration and sampling techniques have made
implementing photorealistic techniques like soft shadows and realistic reflections/re-
fractions feasible for real-time applications like modern video games, which have
hard performance requirements (usually in the tens of milliseconds). Previously,
these applications used ad-hoc techniques like rasterization, screen-space methods,
and baked lighting to accomplish photorealistic imagery with major visual compro-
mises. In some contexts, a user’s non spatially uniform perception of detail means
that photorealistic results aren’t required in every detail of the frame (such as at-
tention on the car vs the overall environment in a modern racing game). Using this
insight, we varied which models went through which pipeline (rasterization/raster
or hardware accelerated ray tracing/RT) based on perceived visual importance to
increase performance. This resulted in overall speedups of up to 4-10ms per frame.
How Menopause Shapes the Relationship Between Cardiovascular and Cerebrovascular Health
Honors Theses , 05/07/2025 , HR-25-17 ,
The CARDIA (Coronary Artery Risk Development in Young Adults) study began in 1985 to investigate the development of cardiovascular disease in young adulthood. It was one of the first and longest longitudinal studies to focus on cardiovascular health among young adults. Although initially designed to follow patients for just five years, the study expanded into a 35-year follow-up, during which researchers collected data on cardiovascular risk factors, dietary patterns, exercise habits, physical health, lifestyle factors, and reproductive milestones, including menopause in women. At years 25, 30, and 35, a subset of participants also completed cognitive tests and underwent brain magnetic resonance imaging (MRI) scans. Previous studies have found a strong correlation between cardiovascular and cerebrovascular disease, with significant implications for clinical outcomes. However, these factors are often unexplored in women, leading to increased misdiagnoses and poorer clinical outcomes. One reason for this may be that medicine currently lacks a deep understanding of the role menopause plays in women’s health. For example, studies have shown that the decrease in estrogen levels that occurs after menopause in women may contribute to cardiovascular disease factors. Thus, we aim to determine how menopause affects the relationship between cardiovascular and cerebrovascular health in women. To achieve this, we analyze the data obtained from the CARDIA study described above. We use a composite score known as Life’s Essential 8 (LE8) to quantify cardiovascular health. To quantify cerebrovascular health, we utilize a measure derived from the MRI scans, known as cerebrovascular reactivity (CVR), which assesses how effectively blood vessels in the brain respond to changes in blood flow.
Using generalized mixed effects models, we find that:
1) Menopause significantly weakens the positive effects LE8 may have on CVR
(p=0.02),
2) There is a significant positive correlation between education and CVR (p=0.03),
3) Among post-menopausal women, there is a significant negative correlation between
LE8 and CVR (p=0.03), and
4) The time since a woman experienced menopause does not affect the relationship
between LE8 and CVR (p=0.90).
Analyzing the Performance of the Linux Load Balancer through Counterexample Synthesis
Honors Theses , 05/07/2025 , HR-25-16 ,
Linux uses a load balancer to distribute running tasks across multiple cores in a system. Ideally, the load balancer is work-conserving, which means that it does not leave any core idle. Additionally, the load balancer should evenly distribute work between the cores to maximize the progress made on each task. Currently, the best approach to testing if a particular implementation satisfies performance requirements is using fuzzing, which can miss certain worst-cases. On the other hand, formal verification of the scheduler is infeasible due to the large size of the code base, and the rapid rate of development would require constant updates to the model. This work aims to reach a middle ground that allows us to generate cases where the load balancer makes an incorrect decision without modeling the entire scheduler. We implement a tracing module in the kernel that collects the visible state, i.e. the minimum set of information the load balancer uses to make its decision. There also exists hidden state that the heuristic does not directly measure, which may affect the performance of the system but does not affect the load balancer's decision. This allows us to manipulate the hidden state to "hallucinate" examples where performance suffers even if such a case is not observed. To do this, we encode consistency checks that ensure the generated hidden state agrees with the visible state and invariants that formally encode desired performance guarantees. Using SMT solvers, we can synthesize hidden states that satisfy the consistency checks but violate an invariant.
Tracing Structured Data with Rust in the Linux Kernel
Honors Theses , 05/07/2025 , HR-25-15 ,
Performance issues in the Linux kernel have huge costs for practically everyone
but finding them can be difficult. The logic for system-critical algorithms has mush-
roomed over the years to account for today’s multicore and NUMA systems, and
manually reading or stepping through those thousands of lines of code to find a bug
is prohibitively difficult. An alternative is tracing several runs of a code section under
investigation and automatically checking those traces for violations of correctness or
performance conditions. In this work, we present novel tracing infrastructure for the
Linux kernel that allows for easy and unobtrusive tracing of complex, optional, or ad
hoc data structures from kernel C code to userspace using Rust’s type system. We
also use this tracing system to instrument the Linux load balancer and find potential
bugs by applying an SMT solver to the traces.
Predicting Brain Age Using a Deep Learning Approach with Functional Neuroimaging and Physical Fitness Features
Honors Theses , 05/06/2025 , HR-25-28 ,
Aging and physical activity are associated with structural and functional changes in the brain and are prominent risk factors for predicting the onset of neurodegenerative diseases. Previous literature shows that the brain undergoes physical changes in response to aging, such as reductions in brain volume, decline in white matter integrity, and abnormalities in brain activity. Engaging in physical activity is known to increase neurogenesis and promote synaptic plasticity, slowing down neuronal damage and degeneration leading to increases in perfusion and increases in gray and white matter volumes. Traditional approaches to study brain aging rely on structural measures with limited investigation of brains hemodynamics, which may diverge from normal aging prior to structural changes. We hypothesize that measures of brain function and hemodynamics can be used to calculate a cerebrovascular brain age and the relation between cerebrovascular brain age and chronological age will be affected by physical activity and exercise. The presented work characterizes the effects of physical activity and exercise on brain function and cerebrovascular resilience using a 3D convolutional neural network based on MRI measures of cerebrovascular reactivity and amplitude low level fluctuations to estimate brain age. Two models will be developed to determine how physical fitness metrics affect the model’s performance.
Walk on Spheres for Linear Elasticity
Honors Theses , 05/06/2025 , HR-25-24 ,
This thesis investigates the application of the walk on spheres algorithm for solving boundary value problems in linear elasticity. Walk on spheres (WoS) is a grid-free Monte Carlo method for approximating solutions for certain types of boundary value problems for partial differential equations. Traditional numerical methods for linear elasticity, such as finite element methods, often require intricate mesh generation, which can be computationally intensive and challenging for domains with complex geometries or topologies. In contrast, WoS can avoid mesh generation entirely as long as the geometry can be described in terms of closest-point queries, thus avoiding the discretization error that affects mesh-based approaches, making WoS particularly suitable for problems where meshing is problematic or infeasible. We develop a recursive integral formulation of linear elasticity with displacement boundary conditions, design a Monte Carlo estimator for this integral, and present implementation results when solving for simple displacement boundary conditions on various geometry.
Code can be found at https://github.com/enpandi/wos4le-thesis
Formalizing Probabilistic Independence as a Model of Separation
Honors Theses , 05/06/2025 , HR-25-14 ,
Probabilistic programs have a variety of applications in randomized algorithms, cryptography, and distributed systems. However, reasoning about and verifying such programs, especially in a modular way, is difficult. Recent work has used probabilistic independence as the basis for a new model of separation that allows separation-logic-like reasoning for probabilistic programs. However, as these probabilistic program logics develop, so has the complexity of their underlying measure theoretic models. This paper uses the Rocq proof assistant to formalize the definitions and properties of probability spaces as resources, with the independent product operation as composition. We mechanically prove that probability spaces satisfy the requirements of a Kripke resource monoid, and in doing so take the first steps towards mechanizing a flexible, trusted program logic for probabilistic programs.
Graph-based Approaches for Protein Network Analysis and Function Prediction
Honors Theses , 05/06/2025 , HR-25-13 ,
Automated protein function prediction (AFP) remains a fundamental challenge in the post-genomic era. Since a protein's function is often reflected in its interactions and correlations with other proteins, biological networks offer a rich source of functional information, and graph-based models provide a natural framework for extracting it. In this study, we construct two novel regulatory networks of human proteins that capture information at the transcription and translation levels. Using graph-theoretic techniques, we analyze these networks at both global and local scales to uncover biological insights and inform downstream AFP tasks. We then integrate each network independently into a graph neural network (GNN) framework and perform ablation studies to assess the contribution of different biological signals. Additionally, we incorporate representations from a protein language model to extract sequence-level information. Our results highlight the distinct sources of information contained in these two networks and their respective roles in enhancing protein function prediction.
Learning Composable Diffusion Policies for Instruction-Following Navigation
Honors Theses , 05/06/2025 , HR-25-12 ,
Dynamic environments present significant challenges for instruction-following navigation algorithms, particularly when replanning is required to adapt to the changing environment. Existing solutions often focus only on one aspect of the problem, either addressing instruction following in static settings or using social navigation in dynamic environments without considering explicitly given instructions. Current instruction-following solutions often rely on vision-language models (VLMs) to generate cost maps as direct inputs for conventional optimization-based planning methods. While these methods work well when planning time is not critical, the computational requirements of current VLMs often make them impractical for more variable dynamic environments. This work is produced as part of a larger project that proposes a two-step approach to this problem. First, train a potential-based diffusion model to generate collision-free paths to a specified goal. Then, fine-tune separate versions of this baseline model, each tailored to generate paths aligned with a corresponding atomic instruction. In this work specifically, we present three main contributions to this project. First, we outline a method to generate random viable paths as effective training data for the diffusion model. We also create simple functions that evaluate task completion and show that fine-tuning using these functions can modify the same initial model to adhere to different instructions. Finally, we show that composing multiple fine-tuned models can produce paths that follow more complex instructions unseen at training time. Some background about the initial diffusion model and fine-tuning approach is provided for context, but it was designed and implemented in the larger project, not this work. Our approach as a whole enables robots to follow instruction-grounded plans in dynamic scenarios without relying on VLM inputs. Furthermore, we demonstrate that fine-tuned models for individual instructions can be composed to produce plans that replicate more complex behaviors.
Metapilot: A Meta-Controller for Coordinated Microservice Resource and Admission Control
Honors Theses , 05/06/2025 , HR-25-11 ,
As many large-scale cloud applications have made the switch from monolithic architectures, microservices have become integral to modern software infrastructure. These microservice systems often rely on lower-level controllers—including both admission and resource controllers—to respond to changing environments and dynamic workloads to maintain performance. However, current state-of-the-art research has tested these controllers in isolation. Performing agnostic operations of these controllers can lead to redundant or even conflicting decisions, ultimately harming overall system efficiency and Service Level Objective (SLO) compliance. We observe from preliminary experimentation that different workloads benefit from different control strategies—it is oftentimes sufficient to run some workloads with just an autoscaler alone, while others require aggressive admission control, and some even benefit from
a combination of both. This heterogeneity in performance motivates a higher-level meta-controller to coordinate decisions.
We introduce Metapilot—a lightweight meta-controller that observes shared state between the controllers (e.g., CPU usage, goodput, SLO violations) and dynamically assigns active controllers at each time step. Metapilot operates over a simple action and discrete action space (e.g., “Autoscaler only”, “Admission only”, or "Both”) and learns through a reward that seeks to maximize goodput while penalizing overprovision and SLO violations. Using reinforcement learning to guide our decisions, Metapilot learns when to prioritize, suppress, or combine the different actions from the controllers to tailor control to the workload’s needs. Our experiments on microservice benchmarks demonstrate that Metapilot-guided coordination leads to robust, efficient system behavior compared to baseline approaches of agnostically running multiple controllers.
Tensor Core Accelerated Kernels for Hadamard and Fourier Masked Attention
Honors Theses , 05/06/2025 , HR-25-10 ,
Softmax attention incurs quadratic time complexity in sequence length, making it the primary bottleneck in transformer training and inference for long sequences. Structured Masked Attention (SMA) offers a sub-quadratic alternative by removing softmax normalization and applying a structured matrix as the attention mask. Dropping softmax enables reformulating elementwise masking as a matrix multiplication, allowing the use of fast transform algorithms to compute the operation in sub-quadratic time when the mask is structured.
Implementing SMA for a given structured mask is straightforward if a standalone fast transform implementation is available. However, standard implementations of these transforms often achieve low hardware utilization on GPUs because they do not leverage high-throughput matrix-multiply units like Tensor Cores. Furthermore, using a separate kernel to perform the transform prevents operator fusion, missing critical opportunities to reuse data in on-chip SRAM.
In this work, we address these challenges for SMA when using Walsh-Hadamard and discrete Fourier transform matrices, both of which admit O(n log n) algorithms for computing matrix-vector products. Specifically, we design fused, Tensor Core accelerated kernels to efficiently perform inference and backpropagation for Hadamard and Fourier masked attention while retaining sub-quadratic time complexity.
Efficient Deployment of Learned Models for Linux Process Scheduling: A Design Space Exploration
Honors Theses , 05/06/2025 , HR-25-09 ,
In Linux 6.12, the extensible scheduler class (sched_ext) has been merged into the source tree, enabling custom scheduling via eBPF programs in place of the Earliest Eligible Virtual Deadline First (EEVDF) scheduler. This new scheduling mechanism allows algorithms to be implemented in userspace while maintaining system integrity by dynamically loading and unloading at runtime, enabling widespread support for algorithmic decisions via access to a variety of userspace languages and libraries. Running scheduling logic in userspace is particularly appealing for integrating machine learning models, which can leverage a superset of task data to generate scheduling heuristics. However, to retain the accuracy of these models, efficient communication methods between the kernel and userspace are critical. In this work, we investigate and benchmark the performance tradeoffs of various data collection and model deployment methods for inference. To optimize scheduling performance, we use an eBPF hashmap for task data tracking, implement an atomic array indexing algorithm for faster prediction updates, create a separate eBPF program for CPU performance metrics, and deploy a learned model using PyTorch AOTInductor to eliminate the overhead of a separate inference process. We introduce a low-latency, model-agnostic framework to integrate learned models into process scheduling decisions with minimal latency overhead. This research lays the foundation for developers to practically deploy custom policies in self-adaptive, data-driven operating system behaviors for CPU scheduling.
SHARD: Global Shape-Aware Reassembly of 3D Fractured Objects
Honors Theses , 05/05/2025 , HR-25-22 ,
Learning to automate the geometric reassembly of broken 3D objects is a complex task recently tackled by the scientific community with various practical applications, such as bone and artifact restoration. Recent works have shown that learning-based methods can reassemble fractured objects using local geometric information without requiring knowledge of the original global shape prior. These approaches struggle with objects involving complex fractures involving many or tiny fragments that lack sufficiently descriptive local geometric information, and also often assume that all fragments are present, which does not reflect real world scenarios where fragments may be missing. This paper introduces SHARD, a template-based reassembly framework that aims to reconstruct broken objects by registering fragments to an inferred global shape prior. Our local-global alignment framework mainly consists of four components: 1) a shared global-fragment geometric point cloud superpoint and feature extractor, 2) an efficient high-order convolution layer to refine and compare features of pairs to be matched, 3) superpoint matching to identify reliable coarse-level correspondences between refined global and fragment superpoint features, and 4) a fine-level registration step for improved fragment registration alignment. We evaluate SHARD on the Breaking Bad dataset and achieve promising reconstruction results through this template-guided framework, and explore potential improvements and tradeoffs in reassembly accuracy and generalizability.
OASIS: Decision Transformer-based Low-Interruption Proactive Job Scheduling for Batch GPU Clusters
Honors Theses , 05/05/2025 , HR-25-08 ,
Long-running deep-learning jobs propose a challenge to current scheduling systems on GPU compute clusters, which use batch-based scheduling. Modern ML model training and inference often takes a long time due to the large number of parameters, requiring researchers to schedule a long sequence of sub-jobs. However, on clusters experiencing high load, sub-jobs can often have long gaps between them, leading to disruptions and delays in research as well as production deployment. To address this, we propose a learned scheduler which leverages a decision transformer to efficiently schedule jobs on a GPU cluster. We compare the performance of our approach to Mirage, a Slurm-compatible scheduler that leverages multiple reinforcement learning approaches to identify improvements in prediction quality. We saw nearly a 4x improvement in interruption over the reactive baseline, and equal performance in ideal decisions and significant improvements in interruption compared to the RL-based Mixture-of-Experts model developed in Mirage. We were also able to show the DT's ability for more complex prediction goals by having it reliably predict a scheduled job's execution time and wait time.
Constraint-Based Global Mesh Encodings
Honors Theses , 05/05/2025 , HR-25-07 ,
Modern constraint solvers such as Z3, Gurobi, and CP-SAT have been effective in solving many different classes of problems, including program verification, resource scheduling, and pathfinding. Given their flexibility and robustness, we seek to utilize constraint solvers for encoding and generating meshes. To our knowledge, previous research in this area typically focuses on using linear and quadratic solvers for optimizing local geometric criteria or parameterizations. Our work deviates from the established literature by exploring the possibility of generating entire meshes defined solely by constraints. In order to accomplish this, we characterize several different techniques for defining global mesh topologies and geometries via combinations of logical, linear, and nonlinear constraints. Using satisfiability modulo theory (SMT) and boolean satisfiability (SAT) solvers on these constraint systems, we generate meshes that comply with given properties, such as being manifold and non-self-intersecting. Finally, we contrast the empirical performance of different implementations and explore their feasibility. In doing so, we identify why certain mesh constraint formulations can significantly impact solver performance and complexity. These results could provide an avenue to further explore new methods for mesh generation and simplification.
CASPER Hazards: Accelerating MLLM Compute and Empowering Real-Time Teleoperation
Honors Theses , 05/05/2025 , HR-25-06 ,
Shared-autonomy systems empower humans to effectively complete complex tasks by selectively employing autonomous support when human teleoperation is difficult, hazardous, or failure prone. However, these systems struggle to arbitrate control and infer human intent due to needing to reason about dynamic environments and unpredictable human actions. Visual language models (VLMs) combine large language models and visual models to perform complex, context-based visual reasoning. Recent work in the field has leveraged VLMs to infer human intent and autonomously take over to complete tasks for shared-autonomy. Despite this potential, VLM inference speed is a significant bottleneck—often causing humans to not wait for an autonomous takeover. Furthermore, current work relies on a direct cable connection to high-powered computers due to the resource-intense processing, restricting robot mobility and range. This work addresses these limitations by employing several optimization techniques—including on-device GPU processing with tensor parallelism, batching, and input reduction—to run inference over 20.9% faster. Additionally, this work introduces a high-throughput network topology which provides the robot access to high-power GPU processing in millisecond latency. Combined, these advancements significantly improve the feasibility of deploying highly-responsive and intuitive VLM-based shared-autonomy in real-world scenarios. Furthermore, the VLM optimization techniques and networking architecture developed have broad applicability across various other domains requiring efficient deployment of VLMs and low-latency wireless control from high-powered servers.
Domain-Based Evaluation of Generative Protein Models
Honors Theses , 05/04/2025 , HR-25-04 ,
Machine learning-based generative protein models offer a way to design novel proteins with useful properties, with applications in medicine, plastic waste decomposition, and material design. In just the last few years, diffusion models which generate protein backbones have shown immense promise. Model development relies on metrics to evaluate architectures and hyperparameter choices, but current in silico metrics such as designability are both inconsistently implemented and fundamentally lacking. I propose a new set of metrics based on analyzing protein substructures known as domains, and I evaluate Genie2, Proteus, RFDiffusion, and Chroma with them. My metrics show that modern generative models do not produce realistic domains, and as a result offer new tools to guide model development.
Applying PoWER to Verify Crash Consistency with Proof Assistants
Honors Theses , 05/02/2025 , HR-25-18 ,
Storage systems are an essential piece in managing data we want to read and
write within our programs. Thus, it is critical that they are reliable in all situations
even when crashes may occur. While experimentally stress-testing our systems may
improve our confidence in their reliability, it cannot make any guarantees on how
the system would behave in all such situations. Formal verification provides an
avenue for making these guarantees and reasoning about situations without the need
to replicate them in testing. However, prior formal verification techniques often
resulted in greater development complexity or required stronger verification tools.
The PoWER technique introduced in prior work was designed to use Hoare logic
preconditions in its API methods in order to verify crash consistency. The technique
has been shown to work when constructing the proofs using SMT (Satisfiability
Modulo Theories) solvers. We implement a log-based storage system in Go and verify
its correctness and crash consistency in Rocq making use of the PoWER technique.
The system shows that the technique is applicable with a toolset that greatly differs
from the originally implemented systems.
Modeling the Brain: Investigating the Role of sPMv in Language
Honors Theses , 05/02/2025 , HR-25-03 ,
Computational methods provide a powerful framework for understanding complex and high-dimensional feature spaces, such as the human brain. Prior studies have shown that neural networks, specifically self supervised learning (SSL) models such as WavLM-large, can be reliably used to predict cortical responses and selectivity patterns in well characterized regions involved in the speech processing hierarchy. These models provide a framework for understanding the relationship between language and neural responses, enabling further insights into the roles of different brain regions in the speech processing hierarchy.
Building upon this success, we apply these methods to investigate a less understood cortical area: the superior part of the ventral premotor cortex (sPMv). Located anterior to the central sulcus and superior to the primary mouth motor region (M1M), sPMv has been implicated in all levels of the speech processing hierarchy and in both speech production and perception. However, its precise role and structural organization remain unclear.
In this work, we model sPMv’s responses to natural speech and, using methods such as principal component analysis (PCA), K-means clustering, and probing, analyze feature spaces
associated with different levels of the speech processing hierarchy. We identify an anterior-posterior gradient of speech feature selectivity extending from sPMv to the superior part of M1M. This gradient mirrors patterns observed in well understood speech processing areas such as the auditory cortex and reveals distinct subregions within sPMv specialized for different levels of speech processing.
These findings demonstrate how computational tools such as SSLs can be used to gain insight into less understood regions of the human cortex and help explain why sPMv is involved in all levels of the speech processing hierarchy.
Deconvolution and Cell-Type Gene Expression Analysis of Dorsolateral Prefrontal Cortex Brain Tissue
Honors Theses , 05/02/2025 , HR-25-02 ,
Bulk RNA-sequencing (bulk RNA-seq) is a widely utilized technology for measuring the aggregate gene expression within a tissue sample. While this approach provides an overall snapshot, its inherent averaging effect can obscure the contributions of individual cell types, which is particularly problematic when examining tissues as complex as the brain. Deconvolution attempts to extract cell type level granularity from existing bulk RNA-seq data. In this study, we applied six established deconvolution methods—BayesPrism, CIBERSORTx, NNLS, ssKL, deconf, and DSA—to bulk RNA-seq data derived from the dorsolateral prefrontal cortex (dlPFC) of in dividuals with Alcohol Use Disorder (AUD) and matched controls. Comparison of the estimated cell-type proportions to anticipated values yielded insights into the relative accuracies and limitations of the techniques when applied on brain tissue. Furthermore, we investigated BayesPrism’s capacity to predict cell-type-specific gene expression by conducting differential expression analyses within each cell type and benchmarking these results against single-nucleus RNA-seq outcomes. To comprehensively evaluate overall trends, Rank–Rank Hypergeometric Overlap (RRHO2) was employed to generate heat maps reflecting similarities and discrepancies between the predicted and single-nucleus gene expression profiles. While concordant results were observed in all cell types, the failure to capture known differentially expressed genes underscore the need for further investigation into the biological and technical factors influencing deconvolution performance. This work highlights the promise and challenges of current deconvolution methodologies for understanding cell-type-specific differences in brain tissue.
Physical Simulation of Relativistic Rigid Rods
Honors Theses , 04/30/2025 , HR-25-05 ,
We present a computational model for simulating rigid rods under relativistic physics. Although rigid bodies are often seen as incompatible with Special Relativity, we develop a formalism that enables a physically grounded simulation. First, we propose a continuous-time Lorentz covariant formulation of Hamilton's Principle in a lab frame and produce a relativistic Lagrangian. The Lagrangian contains spring potentials precisely defined in a COM frame, which is the special instantaneous inertial frame in which the spring endpoints are mass-weighted equidistant. In order to both keep track of this COM frame and determine the lab time events corresponding with this frame, we introduce additional constraint terms to our Lagrangian. Particularly, due to the relativity of simultaneity, endpoint events at a given snapshot of lab time are not usually simultaneous within the COM frame. To address this mismatch, we introduce time shift variables and a corresponding constraint. From there, we discretize the action, yielding a spring-particle system governed by Euler–Lagrange equations at snapshots of lab time, using Lagrange multipliers within a constrained Hamilton's Principle. Finally, we take this system to the infinite stiffness limit to model rigidity, and examine the results.
Fine-tuning Speech Representations with Brain Responses to Language
Honors Theses , 04/29/2025 , HR-25-29 ,
Speech encoding models, which predict fMRI responses from audio stimuli, extract speech representations from pre-trained neural networks and create linear mappings from these representations to predicted responses. Currently, these models are lim- ited in expressivity due to the linear models used to map auditory representations to responses. In this work, we expand on the current setup of these encoding models by fine-tuning the neural network on fMRI data using low-rank adaptation (LoRA) to capture more functions of the stimulus in the learned features. We found that this nonlinear fine-tuning approach improved the average encoding performance of our models by over 12%, with significant improvements in semantic regions of the brain outweighing a slight sacrifice in performance in low-level regions like auditory cortex (AC). Still, when fine-tuning models on specific subsets of cortex like AC, we found corresponding improvements within cortical regions while the models still generalized to new cortical regions within a subject. Additionally, models fine-tuned on one subject generalized to new subjects, with cross-subject encoding performance improving at similar levels when averaged across cortex and within di!erent cortical regions. Overall, fine-tuning speech representations on fMRI data led to significant improvements in encoding performance that generalized across cortical regions and participants. Further, we found stability in our fine-tuning approach with LoRA that allows room to continue improving speech encoding models–without expensive hardware constraints–to more accurately represent biological fMRI responses.
Peaked Boson Sampling: towards efficiently verifiable and NISQ-able quantum advantage
Honors Theses , 04/28/2025 , HR-25-01 ,
The use of randomness in quantum circuits is an intrinsically interesting property due to the model's convergence to the hard-to-simulate Haar measure. While random circuit sampling is both promising for quantum advantage and realistically implementable on a NISQ device, it is not yet efficiently verifiable. Previous work in studying peaked random circuit sampling models has shown optimism for a potentially viable model. In this paper, we extend those observations by studying a simpler, alternative model to quantum computation involving beamsplitter networks. We present numerical and theoretical findings on the structure of peaked beamsplitter networks and evaluate their potential as a candidate for quantum advantage experiments.
Compositional Features For Audio and Speech
Honors Theses , 12/10/2024 , HR-24-18 ,
The world around us can largely be abstracted into a hierarchical taxonomy of
features. Conversations are composed of utterances, composed of phrases, then words,
then syllables, then phones. Visual scenes are composed of objects, then parts, then
textures and shapes. In this paper, we explore how we can discover this taxonomy
and then use these learned representations for flexible, powerful, and efficient machine
learning across audio and speech. First, we introduce a method to train powerful
self-supervised audio models significantly faster than prior work by adapting masked
autoencoders (MAEs) to the audio and speech domains. Following this, we evaluate
the features of several different audio models to disentangle features such as accent,
speaker voice, and semantic content for the task of zero-shot textless voice conversion.
Finally, we establish a method to extend previous self-supervised speech models to
capture representations at abstract granularities such as syllables or words, resulting
in state-of-the-art performance on unsupervised syllable boundary detection and the
lowest-ever bitrate tokenization for speech that maintains word content. Using our
new tokenization scheme, we make generative spoken language models that operate
5x faster at inference and train 100x faster than prior work.
Towards General Purpose Robots at Scale: Lifelong Learning and Learning to Use Memory
Honors Theses , 12/06/2024 , HR-24-17 ,
The widespread success of artificial intelligence in fields like natural language processing
and computer vision has not yet fully transferred to robotics, where progress is hindered
by the lack of large-scale training data and the complexity of real-world tasks. To address
this, many robot learning researchers are pushing to get robots deployed at scale in
everyday unstructured environments like our homes to initiate a data flywheel. While
current robot learning systems are effective for certain short-horizon tasks, they are not
designed to autonomously operate over long time horizons in unstructured environments.
This thesis focuses on addressing two key challenges for robots operating over long time
horizons: memory and lifelong learning.
We propose two novel methods to advance these capabilities. First, we introduce
t-DGR, a trajectory-based deep generative replay method that achieves state-of-the-art
performance on Continual World benchmarks, advancing lifelong learning. Second, we
develop a framework that leverages human demonstrations to teach agents effective
memory utilization, improving learning efficiency and success rates on Memory Gym
tasks. Finally, we discuss future directions for achieving the lifelong learning and memory
capabilities necessary for robots to function at scale in real-world settings.
Booting a Linux Kernel Modified to Minimize Peripheral Requirements on the ACL2 Model of the X86 ISA
Honors Theses , 08/20/2024 , HR-24-16 ,
We extended the ACL2 x86 ISA model with support for a minimal complement of peripherals and
exception handling and then booted (by emulation) a version of the Linux kernel modified to support
these peripherals. Our effort confirms that it is possible to model the x86 ISA formally with sufficient
fidelity to model the execution of an off the shelf operating system with minimal modifications, and
we believe this is the first time an executable formal model has been used to boot Linux, enabling the
specification and analysis of both the Linux kernel and user programs that depend on the Linux kernel
at the binary level.
Previously, no publicly-available formal model, including the aforementioned model of the x86 ISA
written in ACL2, has been complete enough to model the execution of an operating system, which is
necessary if one wishes to prove theorems about programs that depend on an operating system. We also
developed tools to find bugs in the x86 model by comparing the execution of the model to a trusted
implementation of an x86 machine, which found bugs in the specification that were then fixed. Finally,
we added a TLB to the x86 model, which provided a significant speed up in emulation of Linux on the
ACL2 x86 ISA model.
Evaluating Random I/O Device Redirection Policies in ML-Assisted Kernels
Honors Theses , 05/16/2024 , HR-24-13 ,
Machine learning at the OS layer offers benefits such as improved
decision-making in kernel subsystems, adaptability
to workloads and hardware, and enhanced performance through
efficient resource management. However, the integration of
ML models can introduce increased complexity and additional
performance overhead, potentially becoming a bottleneck.
One approach to address this challenge is LAKE, a
system that utilizes ML and specialized hardware like GPUs
to handle the computational load associated with ML decisionmaking
within the kernel space. LAKE builds upon LinnOS,
an OS that employs a lightweight neural network for inferring
SSD performance at a fine, per-I/O granularity. In this
paper, we consider additional random I/O device redirection
policies that extend both the baseline Linux I/O scheduler
and the LAKE latency predictor. We use these random redirection
policies to investigate the effectiveness of the Linux
and LAKE I/O schedulers, evaluate the performance of all
these policies on traces, and describe the benefits of random
redirection algorithms.
Augmenting Cache Replacement Policies to Optimize Data Placement in Multi-Level Cache
Honors Theses , 05/14/2024 , HR-24-12 ,
Current state-of-the-art cache replacement policies largely focus on improving the hit rate of a single cache level. However, none of these policies (including the optimal policy) considers the implications of a typical multi-level cache hierarchy present in modern microprocessors. This thesis considers extending cache-replacement policies to account for the multi-level cache hierarchy by augmenting online cache replacement policies to make them hierarchy-aware. We observe an average of 2.77% theoretical reduction in cycle latency for multi-level MIN and a 2.5% practical Instructions Per Cycle (IPC) improvement across the SPEC2006 benchmark suite for a single-core setting. We also observe performance benefits for practical multi-level cache replacement policies.
Crossing Paths: Incentivizing Exploration with Distributional Reinforcement Learning
Honors Theses , 05/10/2024 , HR-24-11 ,
In reinforcement learning (RL), an agent attempts to learn an optimal policy through repeated interactions with the environment. Unlike traditional reinforce- ment learning algorithms, which learn state-value estimates to arrive at an optimal policy, distributional RL (distRL) aims to model the entire state-value distribution. The primary advantage of distributional methods stems from the regularizing ef- fect of learning a distribution. Recent distRL models attempt to take advantage of the secondary benefits of modeling a distribution, typically by using properties of the distribution to guide exploration or by using distorted expectations over the distribution to create risk-averse or risk-seeking policies. Analysis of prior work suggests that exploration-based methods achieve impressive performance but often suffer from biased data collection, unstable convergence, and overestimation. On the other hand, methods that distort the return distribution to train risk-averse policies have better convergence properties but achieve sub-optimal returns on long-horizon problems due to insufficient exploration. Building off recent advances in exploration techniques within expected and distributional RL, we develop three methods that utilize the state-value distribution to guide exploration. Our work demonstrates how diverse concepts from traditional exploration techniques and risk-aware RL can be combined with knowledge of the value distribution to develop novel algorithms that fully leverage the capabilities of distRL. We evaluate our three proposed methods on a set of of MDP and sparse classic control tasks. Based on benchmark performance, we highlight the promising potential of the Separated QR-DQN (SQR) method and suggest ways it can be adapted to solve complex sparse reward tasks in future work.
Deep Learning Approach to Brain Age Prediction
Honors Theses , 05/10/2024 , HR-24-10 ,
Brain aging is associated with several morphological and physiological changes, which helps anticipate the onset of neurodegenerative diseases. The longitudinal gap between brain age and chronological age can indicate cerebral abnormalities, with a higher gap suggesting accelerated brain aging. Structural features have been used in deep learning models to estimate brain age, but cerebrovascular features have not been fully explored. In this study, we developed a convolutional neural network that used cerebrovascular reactivity and fluctuations in the blood oxygen level dependent (BOLD) signal to estimate brain age. A squeeze-and-excitation network and a convolutional block attention module were incorporated in the deep learning network to improve extraction of channel and spatial interdependencies present in feature maps. Our model was able to predict brain age better than traditional regression approaches and was comparable to deep learning methods that used anatomical features, suggesting that functional hemodynamic features can be useful in early prognosis of cerebrovascular and neurodegenerative diseases.
Checking the Linux Load Balancer
Honors Theses , 05/10/2024 , HR-24-09 ,
As machines go from single-core to multicore, an important aspect of the kernel is load balancing. Load balancing is when the Linux scheduler tries to schedule work across multiple cores fairly, ensuring that no core has too little work or has too much work. Ensuring effective load-balancing logic is critical to taking advantage of the many cores a processor might have. Ideally, when there are more tasks than cores, no core should be idle, each core having work to do. However, that is sometimes not the case, and one CPU needs to “steal” work from another CPU, but for various reasons does not go down that route. It is important to verify that the load-balancing mechanism is working properly. In this paper, we explore the load balancing algorithm in the Linux kernel. We look at the logic that goes into load balancing, what metrics it looks at to determine what core to steal work from, and a possible method to determine whether or not there is a bug in the load balancer. It turns out that while there are many formal methods to discover a bug in the Linux kernel, there are other less formal ways to get a sense of whether or not there is an efficiency issue in the scheduler. The load balancer for the Linux kernel v6.8-rc1 seems to be working properly for certain configurations of a complex workload like the kernel compilation process.
Formal Verification of aarch64 Memory Invariants
Honors Theses , 05/10/2024 , HR-24-08 ,
While weakly ordered memory systems’ relaxed
constraints improve performance, they can be dif-
ficult to reason about. We use the CompCert Coq
implementation of the syntax and semantics of the
aarch64 ISA to provide a formally-verified frame-
work for testing memory invariants of aarch64
multiprocessor systems across both normal and
device memory. To integrate this functionality
into CompCert’s aarch64 model, we extend the
compiler’s assembly output to explicitly represent
multistage memory ordering semantics. We prove
that this representation is semantically identical
to the original assembly, and that one can reorder
operations in this new representation so long as
key invariants are preserved.
RPC: Replacing the PC for Learning-Based Cache Replacement
Honors Theses , 05/10/2024 , HR-24-07 ,
Recent cache replacement policies take one of two approaches to assigning eviction priorities for cache lines. The first is a fine-grained, learning-based approach, exemplified by policies such as Hawkeye and Mockingjay, that inserts cache lines with a priority based on their past behavior. This learning associates behaviors with Program Counters (PCs), that is, the address of the load instruction. Unfortunately, propagating the PC through the memory hierarchy can be a prohibitive cost for chip designers in terms of both physical hardware overhead and the extensive time required to verify and validate these changes to the memory system. The second is a coarse-grained approach, exemplified by LRU, DRRIP [11], KPC (Kill the PC) [14] and RLR (Reinforcement Learning Replacement) [18], where lines are inserted either all with the same priority or one of several static priority values. While these approaches do not use the PC, it is much less accurate than learning-based approaches because it does not actually learn the reuse patterns of different lines. The reason that the program counter is so useful is because it has high separability, meaning that it is exceptional at grouping similar cache lines together, which allows for effective prediction of future behavior. This paper introduces a new cache replacement policy called RPC (Replace the PC) as the first fine-grained, learning-based replacement policy that does not use the program counter. To create RPC, we modify the Mockingjay policy to use a new signature that achieves similar line separability as the PC. RPC outperforms prior PC-free policies: For a set of SPEC 2006 [7] and GAP [3] benchmarks using the IPCP prefetcher [16], it achieves a speedup over LRU of 10.18%, as opposed to KPC which achieves 8.80% and RLR which gets just 1.66%. We also propose a modification to learning-based replacement policies, WARP (Writeback Access Reuse Prediction) that takes writeback accesses into consideration when learning. WARP can be applied to RPC to create WARP-RPC which achieves a speedup of 10.48% over LRU. Lastly we will introduce, WALK-MIN (Writeback Access LooKahead MIN), which builds on Bélády's MIN [5] and Demand-MIN [9] to modify the MIN policy to be write-back aware in a multilevel writeback cache hierarchy.
Self-supporting Masonry Bridges
Honors Theses , 05/09/2024 , HR-24-05 ,
With the assistance of computer graphics and computer-aided design, the paradigm for architectural design has shifted; aesthetics and structural feasibility can work hand-in-hand to generate novel structures. This thesis will highlight the design of self-supporting masonry structures, specifically in the context of bridge construction. This thesis proposes an interactive design framework using thrust network analysis methods to construct new shapes and structures. Given a closed, freeform shape, we want structures that can stand under their weight along with any imposed weight while minimizing the forces within the structure.
Improvements in Transfer Planning Algorithms for Knitting Machines
Honors Theses , 05/06/2024 , HR-24-04 ,
V-bed knitting machines are machines which can produce a variety of soft objects by knitting and maneuvering loops of yarn using hundreds of needles arranged in two beds, or rows. These machines are programmed using a set of low-level instructions, and many programs can have the same overall result while taking vastly different amounts of time to execute. Thus, it is natural to attempt to optimize knitting machine programs.
One part of knitting a complex object is transfer planning. Transfer plans are sequences of instructions which move loops between the needles, arranging them for subsequent operations.
Prior work on transfer planning includes polynomial time algorithms which solve special cases and a general A*-based search algorithm. We will improve this A* approach with search tree pruning based on observations about stacking loops on a single needle, mainly that loops on the same needle move in tandem and can't be separated. This pruned search space will also allow for a slightly simpler state representation.
Latent Diffusion Holodeck: Creating Novel Views in 3D Scenes with Shadow Maps and Inpainting
Honors Theses , 05/05/2024 , HR-24-03 ,
Creating a three-dimensional space using generative AI is an exciting idea, but current methods for doing so are not able to create large, extensible areas that users can explore continuously. Contemporary experiments have utilized a two-dimensional, equirectangular panorama at an initial viewpoint, along with its generated depth map, to texture and deform a spherical mesh, creating a 3D effect. When viewed from the center of the sphere, this approach is somewhat convincing, but moving the camera away from the origin reveals distorted regions of uncertainty, such as areas that should be behind objects visible in the original viewpoint. This project explores the idea of using shadow maps to discover these uncertain areas in novel views and leverages inpainting techniques to repair the missing scenery, allowing users to explore a continuously generated, more convincing virtual environment.
A Study of Triangle Counting on an Emerging Massively Parallel Architecture
Honors Theses , 05/03/2024 , HR-24-14 ,
With the exponential growth in unstructured data over the past few years, Graph Pattern Mining (GPM) serves as an important tool to yield new insights in a variety of domains, from biochemical compounds to social networks. GPM workloads generate massive amounts of intermediate state and require irregular data access patterns that are challenging to scale on typical architectures. PANDOHammer, an extension of the distributed, manycore Hammerblade architecture, introduces a new supercomputing system that can help serve large-scale graph processing workloads. With 1000s of cores that provide high throughput and latency tolerance, PANDOHammer exposes a rich, unexplored design space for graph processing workloads.
We explore Triangle Counting, a simple, yet fundamental, GPM workload on the PANDOHammer system. We (1) create and compare new in-memory distributed representations of large graphs, (2) quantify the performance benefit of moving compute-to-data rather than moving-data-to-compute on PANDOHammer, and (3) investigate the optimal task decomposition for TC on PANDOHammer. Ultimately, we contribute new insights about extracting performance from the PANDOHammer supercomputing system. We find that running Triangle Counting by moving-compute-to-data with extremely fine-grained task decomposition on load-balanced distributed in-memory graphs outperforms Triangle Counting by chunking tasks, using complete vertex caching, and moving-data-to-compute. Not only do we contribute the first end-to-end graph pattern mining application on PANDOHammer, but we also provide a richer understanding of PANDOHammer that empowers us to develop more complex applications for this new supercomputing system.
Explanation-Driven Preference Elicitation with Language Models
Honors Theses , 05/01/2024 , HR-24-02 ,
Modeling user preferences is crucial for developing AI systems that align with individual values, however it remains a challenge due to the limitations of traditional preference elicitation methods in capturing the richness of individual preferences. To explore this issue, we create a novel dataset, ExplanationQA, of {question, answer, explanation} triplets on topics in emerging technologies, and explore whether user explanations can improve preference elicitation and modeling. We use this data to construct user models representing each user’s preferences and measure the accuracy of LLM-generated preference predictions. We find that incorporating explanations into the user model generation increases the accuracy of the LLM-generated survey predictions.
Psychologically-Grounded Refinement for Large Language Models
Honors Theses , 05/01/2024 , HR-24-01 ,
Recent approaches assessing the capabilities of Large Language Models (LLMs) to generate support responses for negative emotions often focus on broad classifications of negative linguistic tone or style, rather than addressing the specific causal factors underlying these emotions. While the former strategy may provide temporary relief in the short-term, only the latter addresses the root causes and fosters long-term solutions. Psychology literature suggests that a precursor to emotion is the way a person appraises the situation surrounding it, signaling new dimensions that emotional support systems can reappraise toward sustainable, healthy emotions. To work towards this end, we build RESORT, a framework that can automatically elicit these cognitive reappraisals, leveraging language to instill reappraisal guidelines across multiple dimensions in LLM instructions. This thesis focuses on one of the strategies within the framework, RESORT-ITER, which iteratively incorporates reappraisal strategies in multiple relevant dimensions to build a cumulative emotional support response. We evaluate reappraisal ability of RESORT-ITER with expert psychologists, which demonstrates that LLMs even at the 7B scale are capable of effective long-term reappraisals. Our evaluation of the cumulative message shows that this proficiency does not entail significant reduction in perceived short-term emotional support by non-specialists.
MTP Austin: Congestion Control for In-Network Compute based Datacenters
Honors Theses , 04/20/2024 , HR-24-15 ,
We study the problem of controlling congestion in a data center network with in-network
compute (INC) elements that can resize, delay, and reorder messages using the MTP transport
protocol. Handling congestion from resized, delayed, or reordered messages offers benefits such
as effective network utilization, preventing message losses in the network, and expensive retransmissions
from the sender; however, preventing congestion requires early congestion detection and
careful response to the transmission rate. Currently, MTP Swift, MTP’s current congestion control
protocol, utilizes INC queues’ load feedback as a congestion signal and packet-based additiveincrease/
multiplicative decrease (AIMD) to control transmission rate. Still, it faces problems
handling queue space during message mutation in INCs and sending large messages fairly when
the congestion window is small. As a result, we utilize feedback signals such as the INC’s queue
threshold detection, INC drop detection, and INC processing resize information as signals for determining
congestion. A key innovation in this congestion control protocol is using the AIMD
algorithm at the message level to control the transmission rate in response to feedback. Our experiments
show that the new congestion control protocol, MTP Austin, leads to lower drops and
retransmission than MTP Swift by addressing message resizing and fairness in message sending.
LUCky SparRow: Logging Updates to Compressed Sparse Row Graphs
Honors Theses , 04/15/2024 , HR-24-06 ,
Sparse graphs are prevalent in many scientific and real-world applications. Compressed sparse row (CSR), a sparse matrix representation, is commonly used to represent sparse graphs in memory for analytics workflows because it increases data locality and is more amenable to pre-fetching than other representations. However, a major limitation of CSR is its lack of support for efficient, in-place updates. In particular, due to its strict compactness, a CSR graph must be rebuilt entirely to add a new edge. Immutability is a critical limitation for graph analytics pipelines seeking real-time insights about evolving graphs. This paper proposes a novel data structure, the log-structured CSR (LSCSR), which maintains the performance benefits of CSR while enabling batched updates to the graph. We evaluate LSCSR on synthetic workloads and explore the trade-offs it makes between compactness, locality, and mutability.
Attribute-based Oblivious Message Retrieval
Honors Theses , 12/16/2023 , HR-23-28 ,
Recent developments in cryptography enable a system of Oblivious Message Retrieval using Homomorphic Encryption schemes. Oblivious Message Retrieval allows for untrusted servers to detect and retrieve encrypted messages efficiently on behalf of users without learning anything about message content or recipients, but retrieves messages solely based on user identities. We extend the functionality of Oblivious Message Retrieval with additional homomorphic encryption operations to support filtering retrieved messages with user-supplied conjunction policies while maintaining the privacy of user queries and filter parameters. Compared to the baseline implementation, our extension adds a computation and communication cost factor that scales approximately quasilinearly to the size of the conjunction.
Memory-efficient in-network OLTP
Honors Theses , 12/12/2023 , HR-23-27 ,
Online transaction processing (OLTP) databases often suffer from reduced throughput, due to high contention on a small set of frequently accessed (”hot”) records. Recent in-network acceleration approaches propose housing hot records in the top-of-rack P4-programmable switch’s memory, and expressing transaction logic in lock-free, line-rate P4 code- thereby avoiding contention. Since switch memory is scarce, we want to reduce the database’s switch memory footprint at the smallest cost to throughput possible. Our solution is MINT, which consolidates the database’s interactions with records on-switch into a fraction of overall execution time (the hot period), saving switch memory during the rest of execution (cold period). This reduces total switch memory footprint by up to 10.7× on our YCSB workload, with only a modest 16% throughput drop.
TreeTracker Join: Killing Two Birds With One Stone
Regular Tech Report , 11/01/2023 , TR-23-02 ,
Many important query processing methods introduce semijoin op-
erators or semijoin-like filters to delete dangling tuples, i.e., tuples
that do not appear in or impact the final query result. These meth-
ods often substantially increase the speed of query execution. How-
ever, the application of these methods must be done with care or
the cost of the additional operations will outweigh the benefit.
TreeTracker Join (TTJ) is an extension of a hash join operator
implemented in iterator interface that detects dangling tuples as
it computes join results and integrates their deletion with the ex-
ecution of a query. The result is a single physical operator that
simultaneously computes two logical operations: the join of two
relations and a semijoin reduction. For both semijoin and filter-
ing methods, trade-off exists between the cost of the additional
operation and potential speed improvement. Thus, many existing
methods resort to heuristics specific for each query. TTJ eliminates
the use of heuristics and additional operations to attain good per-
formance both formally and empirically. Formally, we prove that
computing a k-way acyclic conjunctive query by composing k-1
instances of TTJ in a left-deep plan is optimal in data complexity.
Empirical results on TPC-H and JOB benchmarks show that TTJ
improves query performance on a majority of queries, up to 53%,
when compared to hash join. For the remainder the decline in per-
formance is almost always minimal.
Pages
