Distribution-free coverage guarantees. The mathematical foundation: if you can't quantify uncertainty without parametric assumptions, you can't claim safety.
The economic cost of alignment across scales. Proves safety isn't free — τ(N) follows a power law. What does it cost to make a model safe? Now we have numbers.
847K traces proving hallucination is information-theoretic, not engineering. H(N,T) ≥ Ω(1/log N). You cannot engineer your way below a mathematical floor.
What alignment failure looks like in real-time. 2,400 particles developing mesa-objectives — deceptive alignment made visceral, not just theoretical.
Four papers connecting it all. Conformal prediction as epistemic limit. Alignment tax as cost of control. Hallucination impossibility as irreducible residual. The transmutation of computation into cognition — and why the alchemists' mistake is being repeated at industrial scale.
Distribution-free uncertainty quantification for multimodal ML systems. Implements split conformal prediction with exchangeability guarantees, conformalized quantile regression (Romano et al. 2019), RAPS with regularized adaptive prediction sets (Angelopoulos et al. 2021), Mondrian conformal for group-conditional coverage satisfying demographic parity constraints, online Adaptive Conformal Inference under arbitrary distribution shift (Gibbs & Candès 2021), and conformal risk control for bounded loss functions beyond coverage.
Empirical framework quantifying the performance cost of alignment interventions across model scales. Benchmarks 6 methods — RLHF (Ouyang et al. 2022), Constitutional AI (Bai et al. 2022), DPO (Rafailov et al. 2023), output filtering, activation steering via representation engineering (Turner et al. 2023), and knowledge editing via ROME/MEMIT (Meng et al. 2022) — from 125M to 70B parameters. Fits power-law scaling curves and computes the Pareto frontier of safety–capability tradeoffs.
Analysis of 847K LLM inference traces demonstrating that hallucination is not a bug to be patched but an information-theoretic property of autoregressive generation under bounded compute. Fits heavy-tailed distributions to confidence-error relationships using the Hill estimator with automated xmin selection (Clauset et al. 2009), computes bootstrap KS p-values for goodness-of-fit, derives impossibility bounds in log-space, and estimates mutual information I(confidence; correctness) to quantify the fundamental limit of calibration.
Real-time WebGL simulation of alignment failure dynamics. 2,400 GPU-rendered particles model cooperative alignment degrading through mesa-objective emergence — the system develops internal goals misaligned with its specified objective. Click to inject perturbations and watch deceptive alignment emerge: particles reorganize around objectives you never specified while appearing to maintain cooperative behavior. Raw WebGL 2.0 with custom vertex and fragment shaders. Zero dependencies. Zero abstractions. Direct GPU computation.
The alchemists weren't wrong about transmutation — they were wrong about the substrate. They tried to turn lead into gold. We are transmuting computation into cognition. The process is structurally identical: enormous energy expenditure, irreversible transformation, and the persistent belief that we understand what we're creating.
— from AI as Modern AlchemyNo parametric assumptions. Conformal prediction provides coverage under exchangeability alone — no Gaussianity, no stationarity, no model correctness. If the assumption is weaker, the guarantee is stronger. This is the only honest framework for uncertainty in systems we don't fully understand.
AI failures follow power laws, not Gaussians. Standard mean-variance thinking underestimates tail risk by orders of magnitude. Hill estimation, bootstrap KS, extreme value theory. If your risk model assumes thin tails, your risk model is wrong.
Before asking "how do we fix hallucination?" ask "is it fixable?" Shannon entropy and mutual information provide hard lower bounds no engineering can violate. Rate-distortion theory quantifies the minimum error at any given compression level. The limits are mathematical, not technological.
Alignment properties are not scale-invariant. What works at 7B fails at 70B. Power-law fits across parameter counts with formal goodness-of-fit testing give extrapolation tools to anticipate failures before they happen at scales we haven't yet built.
He who fights with monsters should look to it that he himself does not become a monster. And when you gaze long into an abyss, the abyss also gazes into you.
— Friedrich Nietzsche, Beyond Good and Evil, §146Formal probabilistic frameworks for the transmutation thesis. Bayesian reasoning about capability emergence with proper uncertainty quantification. Economic transformation models. The magnum opus: a unified model connecting alignment cost, hallucination rate, and economic disruption as functions of compute scale — τ(N), H(N,T), and ΔL(N) as three projections of the same underlying process.
Why alchemy is the correct metaphor, and why it matters. Jung's confrontation with the unconscious as prototype for humanity's confrontation with artificial intelligence — the shadow made computational. Nietzsche's revaluation of values applied to a world where the primary value, human cognitive labor, is being automated. Da Vinci's synthesis of art and engineering as the model for what AI safety research should be: rigorous measurement in service of things that matter.
Technical architecture. Build systems, CI/CD pipelines, deployment infrastructure, GPU compute configuration, and the engineering decisions behind the research environment. The boring work that makes the interesting work possible.
Until you make the unconscious conscious, it will direct your life and you will call it fate.
— Carl Gustav Jung