Posts by Collection

ai

A Practitioner’s Guide to Triton

Published:

For the cuda mode community, I gave a lecture titled “A Practitioner’s Guide to Triton”. My goal was to give the best possible intro to Triton, which to me means:

  • tell you when you’d use it, and when to use something else
  • clearly state what knowledge I assume and introduce everything else
  • teach by practically working through examples, only adding a manageable amount of complexity each time
  • guide you to places where you can learn more

We’re collecting world-class triton kernels

Published:

You’re an AI researcher. You try different things, so you need different gpu kernels to make those things be fast (so you have shorter iteration cycles). But maybe other researchers have already written kernels for some parts of your ideas? And you’re no gpu expert yet, so where can you see examples of really good kernels to learn from?

Making OpenAI Triton easier 🔱 😊

Published:

I find writing triton kernels involves many repetitive tasks, that can be cleanly abstracted away. This allows to write triton code much more in line with how I actually think. It’s way more fun, and less mentally draining.

essays

How to organize a Street Festival

Published:

To be as specific as possible, I’ll describe exactly what we did for the Street Festival in Bonn, Germany. The rough process should be applicable to any city.

So organisierst Du ein Straßenfest

Published:

Um so konkret wie möglich zu sein, beschreibe ich das Vorgehen für unser Straßenfest in Bonn. In anderen Städten wird das ähnlich funktionieren.

smarties

Installing Triton 3.0.0

Published:

As of June 13 2024, to get Triton 3.0 you have to install it from source, like so:

Getting ptx from Triton

Published:

You can get the ptx of a triton kernel like so: my_kernel.cache[DEVICE_KEY][INPUTS_KEY].asm['ptx'], where DEVICE_KEY and INPUTS_KEY are determined like below.