HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
skidrow
Born on July 02, 2024
•
389 Karma
About
Submitted
Comments
Favorites
Request
31.
Implementing a Fast Tensor Core Matmul on the Ada Architecture
spatters.ca
11 months ago
skidrow
2 points
32.
Creating custom kernels for the AMD MI300
huggingface.co
11 months ago
skidrow
1 points
33.
Implementing a Fast Tensor Core Matmul on the Ada Architecture
spatters.ca
11 months ago
skidrow
4 points
34.
Implementing a Fast Tensor Core Matmul on the Ada Architecture
spatters.ca
1 comment
11 months ago
skidrow
2 points
35.
Compiler Explorer: An Essential Kernel Playground for CUDA Developers
nvidia.com
11 months ago
skidrow
2 points
36.
Creating custom kernels for the AMD MI300
huggingface.co
11 months ago
skidrow
1 points
37.
DeepSeek-R1 and FP8 Mixed-Precision Training
colfax-intl.com
on April 19, 2025
skidrow
2 points
38.
How to Write a Fast Matrix Multiplication from Scratch with Tensor Cores (2024)
alexarmbr.github.io
17 comments
on April 19, 2025
skidrow
147 points
39.
DeepSeek-R1 and FP8 Mixed-Precision Training
colfax-intl.com
on April 18, 2025
skidrow
2 points
40.
Implementing a Fast Tensor Core Matmul on the Ada Architecture
spatters.ca
on April 18, 2025
skidrow
1 points
41.
How to Write a Fast Matrix Multiplication from Scratch with Tensor Cores
alexarmbr.github.io
on April 18, 2025
skidrow
2 points
42.
Understanding Peak, Max-Achievable and Delivered FLOPs
amd.com
on April 1, 2025
skidrow
1 points
43.
DeepSeek-R1 and FP8 Mixed-Precision Training
colfax-intl.com
on April 1, 2025
skidrow
1 points
44.
Outperforming cuBLAS on H100: A Worklog
cudaforfun.substack.com
on April 1, 2025
skidrow
3 points
45.
Optimizing Matrix Multiplication on RDNA3
seb-v.github.io
26 comments
on March 25, 2025
skidrow
118 points
46.
Outperforming cuBLAS on H100: A Worklog
cudaforfun.substack.com
on March 25, 2025
skidrow
1 points
47.
Mastering LLM Techniques: Inference Optimization
nvidia.com
on March 24, 2025
skidrow
2 points
48.
Optimizing Matrix Multiplication on RDNA3
seb-v.github.io
on March 24, 2025
skidrow
2 points
49.
Outperforming cuBLAS on H100: A Worklog
cudaforfun.substack.com
on March 24, 2025
skidrow
4 points
50.
Understanding Latency Hiding on GPUs [pdf]
eecs.berkeley.edu
on March 17, 2025
skidrow
2 points
51.
AMD Radeon RX 9070 Series Linux GPU Compute Performance
phoronix.com
on March 17, 2025
skidrow
2 points
52.
Outperforming cuBLAS on H100: A Worklog
cudaforfun.substack.com
on March 17, 2025
skidrow
3 points
53.
GPU Gems
nvidia.com
1 comment
on March 16, 2025
skidrow
2 points
54.
Understanding Latency Hiding on GPUs [pdf]
eecs.berkeley.edu
on March 16, 2025
skidrow
2 points
55.
A guide to LLM inference and performance
baseten.co
on Feb 16, 2025
skidrow
1 points
56.
Mastering LLM Techniques: Inference Optimization
nvidia.com
on Feb 16, 2025
skidrow
2 points
57.
GPT from Scratch with MLX
rayfernando.ai
on Feb 16, 2025
skidrow
1 points
58.
Mastering LLM Techniques: Inference Optimization
nvidia.com
on Feb 4, 2025
skidrow
3 points
59.
Beating OpenBLAS in FP32 Matrix Multiplication
salykova.github.io
1 comment
on Feb 4, 2025
skidrow
4 points
60.
Beating OpenBLAS in FP32 Matrix Multiplication
salykova.github.io
on Jan 28, 2025
skidrow
1 points
More