- Triton Kernel Code Dataset
Triton kernel bodies and CUDA→Triton translation pairs for fine-tuning code models.
- Aesthetic Images
7.9k+ filtered high-aesthetic images for diffusion and image-text alignment models.
- Brazilian Portuguese TTS
≈150 hours of multi-speaker Brazilian Portuguese for text-to-speech synthesis.
- Obama Voice Sample Dataset
25+ minutes of clean 24kHz speech from public addresses, optimized for RVC training.