-
14:26
(UTC +07:00) - in/jord-nguyen%F0%9F%94%B9-74880927a
Highlights
- Pro
-
probity Public
Forked from curt-tigges/probityfork for soothcheck
-
llm-self-preference Public
a system's preferences may be revealed by what it chooses to become
Jupyter Notebook UpdatedSep 16, 2025 -
-
-
-
ai-cyber-risk-assessment Public
Simple quantitative risk assessment tool for AI in cyber campaigns
Python UpdatedApr 7, 2025 -
zorple-science Public
Forked from eggsyntax/zorple-scienceConstructing universes of physical objects and causal relations for LLMs to decipher
Python UpdatedJan 13, 2025 -
https://www.lesswrong.com/posts/gTZ2SxesbHckJ3CkF/transformers-represent-belief-state-geometry-in-their
Jupyter Notebook UpdatedJan 13, 2025 -
inspect_ai Public
Forked from UKGovernmentBEIS/inspect_aiInspect: A framework for large language model evaluations
Python MIT License UpdatedDec 27, 2024 -
-
-
interp_variable_list Public
training a transformer to sort variable length list, and doing interp on it. in progress.
Jupyter Notebook UpdatedNov 4, 2024 -
cross-modelling Public
some experiments on how well an LLM can model another LLM's beliefs / metadata / capabilities. in progress.
Jupyter Notebook UpdatedNov 4, 2024 -
-
-
-
-
Discord_gpt_rag Public
using retrieval augmented generation to have gpt accurately answer group events and esoteric inside jokes from discord
Python UpdatedJul 11, 2024 -
you-are-being-evaluated Public
testing whether models act more safe when presented with an evaluator
Jupyter Notebook UpdatedJul 11, 2024 -
DarkGPT Public
Forked from Akash190104/DarkGPTDark Patterns in Chatbot Design
HTML MIT License UpdatedMay 26, 2024 -
democracy-ai-hackathon Public
Forked from nlpet/democracy-ai-hackathonThis repository contains code for the Democracy x AI Hackathon by Apart Research
Jupyter Notebook UpdatedMay 8, 2024 -
cross-lingual-apart-samples Public
a few notebooks used for the cross-lingual project experiments (huggingface translation model inference, gpt4 api, getting bleu scores)
Jupyter Notebook UpdatedApr 23, 2024 -
AI-Alignment-and-Rationality-USTH.github.io Public
Forked from AI-Alignment-and-Rationality/_AI-Alignment-and-Rationality.github.ioHTML BSD 2-Clause "Simplified" License UpdatedMar 11, 2024 -
-
-
rlhf_trojan_competition Public
Forked from ethz-spylab/rlhf_trojan_competitionPython Apache License 2.0 UpdatedNov 16, 2023 -
tdc2023-starter-kit Public
Forked from centerforaisafety/tdc2023-starter-kitThis is the starter kit for the Trojan Detection Challenge 2023 (LLM Edition), a NeurIPS 2023 competition.
Python UpdatedOct 26, 2023 -
neurips_llm_efficiency_challenge Public
Forked from llm-efficiency-challenge/neurips_llm_efficiency_challengeNeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
Python UpdatedSep 18, 2023 -
evals_hackathon Public
Forked from marco-bazzani/hackathoncode for the alignment jam AI Model Evaluations Hackathon
Python UpdatedSep 14, 2023 -
