RSS

Tomek Korbak

Karma: 745

Senior Research Scientist at UK AISI working on AI control

https://​​tomekkorbak.com/​​

How to eval­u­ate con­trol mea­sures for LLM agents? A tra­jec­tory from to­day to superintelligence

14 Apr 2025 16:45 UTC
29 points
1 comment2 min readLW link

A sketch of an AI con­trol safety case

30 Jan 2025 17:28 UTC
57 points
0 comments5 min readLW link

Elic­it­ing bad contexts

24 Jan 2025 10:39 UTC
31 points
8 comments3 min readLW link

Au­toma­tion collapse

21 Oct 2024 14:50 UTC
72 points
9 comments7 min readLW link

Com­po­si­tional prefer­ence mod­els for al­ign­ing LMs

Tomek Korbak25 Oct 2023 12:17 UTC
18 points
2 comments5 min readLW link

Towards Un­der­stand­ing Sy­co­phancy in Lan­guage Models

24 Oct 2023 0:30 UTC
66 points
0 comments2 min readLW link
(arxiv.org)

Paper: LLMs trained on “A is B” fail to learn “B is A”

23 Sep 2023 19:55 UTC
121 points
74 comments4 min readLW link
(arxiv.org)

Paper: On mea­sur­ing situ­a­tional aware­ness in LLMs

4 Sep 2023 12:54 UTC
109 points
16 comments5 min readLW link
(arxiv.org)

Imi­ta­tion Learn­ing from Lan­guage Feedback

30 Mar 2023 14:11 UTC
71 points
3 comments10 min readLW link

Pre­train­ing Lan­guage Models with Hu­man Preferences

21 Feb 2023 17:57 UTC
135 points
20 comments11 min readLW link2 reviews
OSZAR »