In Logical Induction, Garrabrant et al. "present a computable algorithm that assigns probabilities to every logical statement in a given formal language, and refines those probabilities over time." This algorithm has a number of nice properties like handling Godelian uncertainty, and learning statistical patterns in logical formulas. However, the applicability of this work remains unclear with, to my knowledge, only two papers having built on it significantly: Rational inductive agents, and 'Forecasting using incomplete models'.
Will 3 or more technical papers appear on Arxiv, post-2022, which build on Garrabrant's et al.'s work before 2025? This count will include empirical work which uses a more tractable approximation of the Garrabrant method.
I haven't looked in detail but seems plausible that this work qualifies: https://link.springer.com/article/10.1007/s10701-024-00755-9#Abs1
@JacobPfau I did not find any others after looking through the citation trail quickly, will leave this unresolved for another 24 hours in case anyone brings something to my attention.
what would it look like to build a neural network system that used insights from logical inductors to make ai safer? would it integrate with any of the key points of my "ai safety subcomponents as I see it today" list? https://manifold.markets/L/will-this-overview-of-safety-resear
@L Would probably relate to your bullet "it seems like most of safety boils down to a trustable uncertainty representation. the thing I want to formally verify is that the net knows when to stop and ask the detected agent whether an outcome it expects is appropriate."