To run this implementation, the nightly Edition of triton and torch is going to be put in. This Variation can be operate on an individual 80GB GPU for gpt-oss-120b. To carry out inference you'll need to very first change the SafeTensor weights from Hugging Confront into the best structure using: https://fernandovfdkm.onesmablog.com/an-unbiased-view-of-hbs-case-study-solution-78167478