Search: alignmentforum.org | Heykuki News

HK

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

1.

Gödel, Escher, Bach: an in-depth explainer

alignmentforum.org

4 years ago

430 points

2.

A Mechanistic Interpretability Analysis of Grokking

alignmentforum.org

3 years ago

202 points

3.

When do "brains beat brawn" in chess? An experiment

alignmentforum.org

3 years ago

124 points

4.

Understanding “Deep Double Descent”

alignmentforum.org

7 years ago

108 points

5.

Without specific countermeasures, the easiest path likely leads to AI takeover

alignmentforum.org

4 years ago

16 points

6.

Catastrophic sabotage as a major threat model for human-level AI systems

alignmentforum.org

2 years ago

5 points

7.

The Unsolved Technical Alignment Problem in LeCun's A Path Towards AGI

alignmentforum.org

3 years ago

4 points

8.

Data, not size, is the current active constraint on language model performance

alignmentforum.org

3 years ago

4 points

9.

AI Will Not Want to Self-Improve

alignmentforum.org

3 years ago

3 points

10.

Looking for Backdoors in Jane Street LLMs

alignmentforum.org

15 days ago

3 points

11.

Transformers Represent Belief State Geometry in Their Residual Stream

alignmentforum.org

2 years ago

3 points

12.

Paths to High-Level Machine Intelligence

alignmentforum.org

5 years ago

3 points

13.

A Brief Intro to Domain Theory

alignmentforum.org

7 years ago

3 points

14.

Exploration Hacking: Can LLMs Learn to Resist RL Training?

alignmentforum.org

a month ago

2 points

15.

Test your interpretability techniques by de-censoring Chinese models

alignmentforum.org

3 months ago

2 points

16.

Will reward-seekers respond to distant incentives?

alignmentforum.org

4 months ago

2 points

17.

How Can Interpretability Researchers Help AGI Go Well?

alignmentforum.org

7 months ago

2 points

18.

How to Become a Mechanistic Interpretability Researcher

alignmentforum.org

10 months ago

2 points

19.

Highly Opinionated Advice on How to Write ML Papers

alignmentforum.org

a year ago

2 points

20.

Would catching AIs trying to escape convince AI devs to slow down or undeploy?

alignmentforum.org

2 years ago

2 points

21.

Opinionated Annotated List of Favourite Mechanistic Interpretability Papers v2

alignmentforum.org

2 years ago

2 points

22.

Modern Transformers Are AGI, and Human-Level

alignmentforum.org

2 years ago

2 points

23.

Larger language models may disappoint you [or, an eternally unfinished draft]

alignmentforum.org

2 years ago

2 points

24.

AGI safety from first principles: Superintelligence

alignmentforum.org

3 years ago

2 points

25.

Anthropic Fall 2023 Debate Progress Update

alignmentforum.org

3 years ago

2 points

26.

Critique of some recent philosophy of LLMs' minds

alignmentforum.org

3 years ago

2 points

27.

GPTs are Predictors, not Imitators or Simulators

alignmentforum.org

3 years ago

2 points

28.

Imitation Learning from Language Feedback

alignmentforum.org

3 years ago

2 points

29.

Othello-GPT Has a Linear Emergent World Representation

alignmentforum.org

3 years ago

2 points

30.

Some Lessons Learned from Studying Indirect Object Identification in GPT-2 Small

alignmentforum.org

4 years ago

2 points