HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
31.
▲
How truthful is GPT-3?
alignmentforum.org
discuss
5 years ago
mathattack
2 points
32.
▲
The Codex Skeptic FAQ
alignmentforum.org
discuss
5 years ago
mtrazzi
2 points
33.
▲
Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers
alignmentforum.org
discuss
5 years ago
owenshen24
2 points
34.
▲
Interpreting GPT: The Logit Lens
alignmentforum.org
discuss
6 years ago
RiversHaveWings
2 points
35.
▲
On the purposes of decision theory research
alignmentforum.org
discuss
7 years ago
hnryjmes
2 points
36.
▲
Reframing Superintelligence: Comprehensive AI Services as General Intelligence
alignmentforum.org
discuss
7 years ago
T-A
2 points
37.
▲
OpenAI: Common Misconceptions
alignmentforum.org
2 comments
4 years ago
O__________O
1 points
38.
▲
A Mechanistic Interpretability Analysis of Grokking
alignmentforum.org
1 comment
4 years ago
poga
1 points
39.
▲
Embedding Spaces – Transformer Token Vectors Are Not Points in Space
alignmentforum.org
discuss
9 months ago
ofou
1 points
40.
▲
LLMs Are Simulators
alignmentforum.org
discuss
a year ago
msvana
1 points
41.
▲
AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work
alignmentforum.org
discuss
2 years ago
sebg
1 points
42.
▲
Mysteries of mode collapse – AI Alignment Forum (2022)
alignmentforum.org
discuss
2 years ago
Bluestein
1 points
43.
▲
LLMs for Alignment Research: a safety priority?
alignmentforum.org
discuss
2 years ago
rntn
1 points
44.
▲
Mesa-Optimization
alignmentforum.org
discuss
3 years ago
reqo
1 points
45.
▲
Glitch Tokens
alignmentforum.org
discuss
3 years ago
peter_d_sherman
1 points
46.
▲
Concrete Open Problems in Mechanistic Interpretability
alignmentforum.org
discuss
3 years ago
raviparikh
1 points
47.
▲
Gitch Tokens in GPT (SolidGoldMagikarp)
alignmentforum.org
discuss
3 years ago
gwd
1 points
48.
▲
Central AI alignment problem: capabilities generalization and sharp left turn
alignmentforum.org
discuss
4 years ago
kvee
1 points
49.
▲
A Mechanistic Interpretability Analysis of Grokking
alignmentforum.org
discuss
4 years ago
apetresc
1 points
50.
▲
A Mechanistic Interpretability Analysis of Grokking
alignmentforum.org
discuss
4 years ago
caprock
1 points
51.
▲
A Mechanistic Interpretability Analysis of Grokking
alignmentforum.org
discuss
4 years ago
jordn
1 points
52.
▲
Specificity: Brain's Superpower (2019)
alignmentforum.org
discuss
4 years ago
Tomte
1 points
53.
▲
Fixing the Good Regulator Theorem (2019)
alignmentforum.org
discuss
4 years ago
Schiphol
1 points
54.
▲
Embedded Agency and AI Alignment
alignmentforum.org
discuss
4 years ago
isaacimagine
1 points
55.
▲
Demons In Imperfect Search (2020)
alignmentforum.org
discuss
5 years ago
optimalsolver
1 points
56.
▲
Another (Outer) AI Alignment Failure Story
alignmentforum.org
discuss
5 years ago
Rescis
1 points
57.
▲
Birds, Brains, Planes, and AI
alignmentforum.org
discuss
5 years ago
ignoranceprior
1 points
58.
▲
Birds, Planes, Brains, and AI
alignmentforum.org
discuss
5 years ago
ignoranceprior
1 points
59.
▲
Multi-dimensional rewards for AGI interpretability and control
alignmentforum.org
discuss
5 years ago
CapitalistCartr
1 points
60.
▲
Alignment as a Bottleneck to Usefulness of GPT-3
alignmentforum.org
discuss
6 years ago
edouard-harris
1 points
More