Show HN: Benchmarking AI Chatbot with Game Prompts

Heykuki News

1 point

3 years ago

I’ve been using these prompts To compare how different LLMs perform, and the results have been surprisingly staggering.

The toughest one is Wheel of Fortune, which only works consistently on GPT4.

3.5 turbo rarely works, or it does with surface level misunderstanding gameplay.

Bard never works.

BingChat kinda works, but sometimes gets sassy and ends the chat.

No comments

Threaded

Loading comments...

Show HN: Benchmarking AI Chatbot with Game Prompts | Heykuki News