HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
1.
▲
Gödel, Escher, Bach: an in-depth explainer
alignmentforum.org
248 comments
4 years ago
behnamoh
430 points
2.
▲
A Mechanistic Interpretability Analysis of Grokking
alignmentforum.org
54 comments
3 years ago
famouswaffles
202 points
3.
▲
When do "brains beat brawn" in chess? An experiment
alignmentforum.org
79 comments
3 years ago
andrewljohnson
124 points
4.
▲
Understanding “Deep Double Descent”
alignmentforum.org
16 comments
7 years ago
alexcnwy
108 points
5.
▲
Without specific countermeasures, the easiest path likely leads to AI takeover
alignmentforum.org
27 comments
4 years ago
kvee
16 points
6.
▲
Catastrophic sabotage as a major threat model for human-level AI systems
alignmentforum.org
discuss
2 years ago
speckx
5 points
7.
▲
The Unsolved Technical Alignment Problem in LeCun's A Path Towards AGI
alignmentforum.org
discuss
3 years ago
sandinmyjoints
4 points
8.
▲
Data, not size, is the current active constraint on language model performance
alignmentforum.org
discuss
3 years ago
satvikpendem
4 points
9.
▲
AI Will Not Want to Self-Improve
alignmentforum.org
3 comments
3 years ago
behnamoh
3 points
10.
▲
Looking for Backdoors in Jane Street LLMs
alignmentforum.org
discuss
15 days ago
allenleee
3 points
11.
▲
Transformers Represent Belief State Geometry in Their Residual Stream
alignmentforum.org
discuss
2 years ago
HR01
3 points
12.
▲
Paths to High-Level Machine Intelligence
alignmentforum.org
discuss
5 years ago
Daniel_Eth
3 points
13.
▲
A Brief Intro to Domain Theory
alignmentforum.org
discuss
7 years ago
yarapavan
3 points
14.
▲
Exploration Hacking: Can LLMs Learn to Resist RL Training?
alignmentforum.org
discuss
a month ago
Prof_Sigmund
2 points
15.
▲
Test your interpretability techniques by de-censoring Chinese models
alignmentforum.org
discuss
3 months ago
allenleee
2 points
16.
▲
Will reward-seekers respond to distant incentives?
alignmentforum.org
discuss
4 months ago
gmays
2 points
17.
▲
How Can Interpretability Researchers Help AGI Go Well?
alignmentforum.org
discuss
7 months ago
gmays
2 points
18.
▲
How to Become a Mechanistic Interpretability Researcher
alignmentforum.org
discuss
10 months ago
speckx
2 points
19.
▲
Highly Opinionated Advice on How to Write ML Papers
alignmentforum.org
discuss
a year ago
jxmorris12
2 points
20.
▲
Would catching AIs trying to escape convince AI devs to slow down or undeploy?
alignmentforum.org
discuss
2 years ago
rntn
2 points
21.
▲
Opinionated Annotated List of Favourite Mechanistic Interpretability Papers v2
alignmentforum.org
discuss
2 years ago
thunderbong
2 points
22.
▲
Modern Transformers Are AGI, and Human-Level
alignmentforum.org
discuss
2 years ago
rntn
2 points
23.
▲
Larger language models may disappoint you [or, an eternally unfinished draft]
alignmentforum.org
discuss
2 years ago
behnamoh
2 points
24.
▲
AGI safety from first principles: Superintelligence
alignmentforum.org
discuss
3 years ago
warkanlock
2 points
25.
▲
Anthropic Fall 2023 Debate Progress Update
alignmentforum.org
discuss
3 years ago
EvgeniyZh
2 points
26.
▲
Critique of some recent philosophy of LLMs' minds
alignmentforum.org
discuss
3 years ago
behnamoh
2 points
27.
▲
GPTs are Predictors, not Imitators or Simulators
alignmentforum.org
discuss
3 years ago
famouswaffles
2 points
28.
▲
Imitation Learning from Language Feedback
alignmentforum.org
discuss
3 years ago
tim_sw
2 points
29.
▲
Othello-GPT Has a Linear Emergent World Representation
alignmentforum.org
discuss
3 years ago
todsacerdoti
2 points
30.
▲
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 Small
alignmentforum.org
discuss
4 years ago
gbrown_
2 points
More