Follow
Adrià Garriga-Alonso
Adrià Garriga-Alonso
Research Scientist, FAR AI
Verified email at far.ai - Homepage
Title
Cited by
Cited by
Year
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models
A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ...
arXiv preprint arXiv:2206.04615, 2022
10752022
Deep Convolutional Networks as shallow Gaussian Processes
A Garriga-Alonso, L Aitchison, CE Rasmussen
International Conference on Learning Representations, 2019
2962019
Bayesian neural network priors revisited
V Fortuin, A Garriga-Alonso, SW Ober, F Wenzel, G Rätsch, RE Turner, ...
arXiv preprint arXiv:2102.06571, 2021
1512021
Towards automated circuit discovery for mechanistic interpretability
A Conmy, A Mavor-Parker, A Lynch, S Heimersheim, A Garriga-Alonso
Advances in Neural Information Processing Systems 36, 16318-16352, 2023
1332023
Causal scrubbing: A method for rigorously testing interpretability hypotheses
L Chan, A Garriga-Alonso, N Goldowsky-Dill, R Greenblatt, ...
AI Alignment Forum, 10, 2022
532022
Understanding variational inference in function-space
DR Burt, SW Ober, A Garriga-Alonso, M van der Wilk
arXiv preprint arXiv:2011.09421, 2020
482020
Exact Langevin dynamics with stochastic gradients
A Garriga-Alonso, V Fortuin
arXiv preprint arXiv:2102.01691, 2021
382021
Data augmentation in Bayesian neural networks and the cold posterior effect
S Nabarro, S Ganev, A Garriga-Alonso, V Fortuin, M van der Wilk, ...
Uncertainty in Artificial Intelligence, 1434-1444, 2022
312022
BNNpriors: A library for Bayesian neural network inference with different prior distributions
V Fortuin, A Garriga-Alonso, M van der Wilk, L Aitchison
Software Impacts 9, 100079, 2021
252021
Correlated weights in infinite limits of deep convolutional neural networks
A Garriga-Alonso, M van der Wilk
Uncertainty in Artificial Intelligence, 1998-2007, 2021
72021
Analyzing the Generalization and Reliability of Steering Vectors
D Tan, D Chanin, A Lynch, D Kanoulas, B Paige, A Garriga-Alonso, R Kirk
arXiv preprint arXiv:2407.12404, 2024
12024
Hypothesis Testing the Circuit Hypothesis in LLMs
C Shi, N Beltran-Velez, A Nazaret, C Zheng, A Garriga-Alonso, A Jesson, ...
ICML 2024 Workshop on Mechanistic Interpretability, 2024
12024
Probability Density Imputation of Missing Data with Gaussian Mixture Models
A Garriga-Alonso
University of Oxford, 2017
12017
Solving Montezuma's Revenge with Planning and Reinforcement Learning
A Garriga-Alonso
Universitat Pompeu Fabra, 2016
12016
Planning behavior in a recurrent neural network that plays Sokoban
A Garriga-Alonso, M Taufeeque, A Gleave
arXiv preprint arXiv:2407.15421, 2024
2024
Adversarial Circuit Evaluation
A Garriga-Alonso
arXiv preprint arXiv:2407.15166, 2024
2024
Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification
T Kwa, D Thomas, A Garriga-Alonso
arXiv preprint arXiv:2407.14503, 2024
2024
Investigating the Indirect Object Identification circuit in Mamba
D Ensign, A Garriga-Alonso
arXiv preprint arXiv:2407.14008, 2024
2024
InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques
R Gupta, I Arcuschin, T Kwa, A Garriga-Alonso
arXiv preprint arXiv:2407.14494, 2024
2024
Priors in finite and infinite Bayesian convolutional neural networks
A Garriga Alonso
2023
The system can't perform the operation now. Try again later.
Articles 1–20