[{"data":1,"prerenderedAt":6120},["ShallowReactive",2],{"all-posts":3},[4,685,5138,5655,5827],{"id":5,"title":6,"author":7,"body":8,"categories":670,"date":675,"description":676,"extension":677,"hidden":678,"meta":679,"navigation":130,"path":680,"seo":681,"stem":682,"thumbnail":683,"__hash__":684},"blog\u002Fblog\u002Ftpu-silicon-engine-ai-training.md","Tensor Processing Units (TPUs): The Silicon Engine Behind Modern AI Training","Anurag Kanade",{"type":9,"value":10,"toc":654},"minimark",[11,15,18,23,26,30,38,41,58,61,65,78,90,101,104,183,186,190,196,209,216,226,229,233,240,284,288,314,317,321,331,358,392,395,399,406,426,429,433,438,441,447,451,458,467,471,598,602,650],[12,13,14],"p",{},"Every neural network training run burns through billions of matrix multiplications. Doing that on standard processors was never going to scale.",[12,16,17],{},"Now imagine if a chip was purpose-built for the exact math that powers deep learning. That idea, realized in Tensor Processing Units, changed how the world trains large models.",[19,20,22],"h2",{"id":21},"the-challenge-behind-training-at-scale","The Challenge Behind Training at Scale",[12,24,25],{},"Modern AI systems face a fundamental bottleneck: training a large language model like BERT or GPT requires processing trillions of floating-point operations. Multiply that by the need for faster experimentation cycles, and compute quickly becomes the limiting factor. Earlier approaches like CPU clusters or even GPU farms helped, but each involved trade-offs — throwing more hardware at the problem at the cost of power consumption, cost, or programming complexity.",[19,27,29],{"id":28},"understanding-the-tpu-advantage","Understanding the TPU Advantage",[12,31,32,33,37],{},"Before we jump into how TPUs accelerate training, it helps to understand what makes them different from GPUs. In machine learning, the dominant operations are ",[34,35,36],"strong",{},"matrix multiplications"," — massive batches of multiply-accumulate operations that transform high-dimensional tensors.",[12,39,40],{},"GPUs were originally designed for graphics: rendering pixels, shading triangles, parallel texture mapping. They happen to be good at deep learning because graphics and neural networks both benefit from parallel computation. But GPUs are general-purpose parallel processors.",[42,43,44],"blockquote",{},[12,45,46,49,50,57],{},[34,47,48],{},"NOTE:"," TPUs are not just \"faster GPUs.\" They are ",[51,52,56],"a",{"href":53,"rel":54},"https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FApplication-specific_integrated_circuit",[55],"nofollow","application-specific integrated circuits (ASICs)"," designed exclusively for the tensor operations that dominate neural network training — trading generality for raw efficiency in a specific domain.",[12,59,60],{},"This makes them incredibly efficient for model training, but it's a trade-off: TPUs excel at tensor math but are less flexible for arbitrary computations.",[19,62,64],{"id":63},"matrix-multiply-units-mxu","Matrix Multiply Units (MXU)",[12,66,67,68,71,72,77],{},"Inside the TPU resides the ",[34,69,70],{},"Matrix Multiply Unit (MXU)",", a massive ",[51,73,76],{"href":74,"rel":75},"https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FSystolic_array",[55],"systolic array"," that performs tens of thousands of operations per clock cycle.",[12,79,80,85,86,89],{},[51,81,84],{"href":82,"rel":83},"https:\u002F\u002Fdocs.cloud.google.com\u002Ftpu\u002Fdocs\u002Fv4",[55],"TPU v4"," can perform ",[34,87,88],{},"275 trillion"," matrix multiply-accumulate operations per second. To understand why this matters, let's look at a typical BERT-large layer:",[91,92,97],"pre",{"className":93,"code":95,"language":96},[94],"language-text","Operations = 32 × 1024 × 4096 = 134 million MACs per layer forward pass\n","text",[98,99,95],"code",{"__ignoreMap":100},"",[12,102,103],{},"And that's just one layer. BERT-large has 24 of them, plus backward passes, plus multiple training steps.",[91,105,109],{"className":106,"code":107,"language":108,"meta":100,"style":100},"language-python shiki shiki-themes github-light github-dark","# Input: batch of 32 sequences, each with 1024-dimensional embeddings\nX = np.random.randn(32, 1024).astype(np.float32)\n\n# Weights: projecting from 1024 to 4096 dimensions\nW = np.random.randn(1024, 4096).astype(np.float32)\n\nb = np.zeros(4096, dtype=np.float32)\n\n# Standard matrix multiplication: 134M operations\nY = np.matmul(X, W) + b\n\nprint(f\"Operations: {32 * 1024 * 4096:,} multiply-accumulates\")\n","python",[98,110,111,119,125,132,138,144,149,155,160,166,172,177],{"__ignoreMap":100},[112,113,116],"span",{"class":114,"line":115},"line",1,[112,117,118],{},"# Input: batch of 32 sequences, each with 1024-dimensional embeddings\n",[112,120,122],{"class":114,"line":121},2,[112,123,124],{},"X = np.random.randn(32, 1024).astype(np.float32)\n",[112,126,128],{"class":114,"line":127},3,[112,129,131],{"emptyLinePlaceholder":130},true,"\n",[112,133,135],{"class":114,"line":134},4,[112,136,137],{},"# Weights: projecting from 1024 to 4096 dimensions\n",[112,139,141],{"class":114,"line":140},5,[112,142,143],{},"W = np.random.randn(1024, 4096).astype(np.float32)\n",[112,145,147],{"class":114,"line":146},6,[112,148,131],{"emptyLinePlaceholder":130},[112,150,152],{"class":114,"line":151},7,[112,153,154],{},"b = np.zeros(4096, dtype=np.float32)\n",[112,156,158],{"class":114,"line":157},8,[112,159,131],{"emptyLinePlaceholder":130},[112,161,163],{"class":114,"line":162},9,[112,164,165],{},"# Standard matrix multiplication: 134M operations\n",[112,167,169],{"class":114,"line":168},10,[112,170,171],{},"Y = np.matmul(X, W) + b\n",[112,173,175],{"class":114,"line":174},11,[112,176,131],{"emptyLinePlaceholder":130},[112,178,180],{"class":114,"line":179},12,[112,181,182],{},"print(f\"Operations: {32 * 1024 * 4096:,} multiply-accumulates\")\n",[12,184,185],{},"On a CPU, this executes sequentially. On a GPU, it runs in parallel across CUDA cores. On a TPU, the entire operation flows through a systolic array in a pipelined fashion.",[19,187,189],{"id":188},"systolic-arrays","Systolic Arrays",[12,191,192,193,195],{},"A ",[34,194,76],{}," is a 2D grid of processing elements where each element:",[197,198,199,203,206],"ul",{},[200,201,202],"li",{},"Receives input from its neighbors",[200,204,205],{},"Performs a multiply-accumulate operation (MAC)",[200,207,208],{},"Passes results to the next element",[12,210,211],{},[212,213],"img",{"alt":214,"src":215},"Systolic Array for Matrix Multiplication","https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fcommons\u002Fthumb\u002F1\u002F13\u002FOutput_Stationary_Systolic_Array_Example.png\u002F960px-Output_Stationary_Systolic_Array_Example.png",[12,217,218],{},[219,220,221,222],"em",{},"Figure: Systolic Array for Matrix Multiplication. Source: ",[51,223,225],{"href":74,"rel":224},[55],"Wikipedia",[12,227,228],{},"This pipelined approach means once the array is filled, it produces one result element per clock cycle with minimal memory access.",[19,230,232],{"id":231},"tpu-memory-high-bandwidth-on-chip","TPU Memory: High Bandwidth, On-Chip",[12,234,235,236,239],{},"Traditional architectures suffer from the ",[34,237,238],{},"von Neumann bottleneck",": moving data between memory and compute units takes time and energy. TPUs address this with:",[241,242,243,256],"table",{},[244,245,246],"thead",{},[247,248,249,253],"tr",{},[250,251,252],"th",{},"Component",[250,254,255],{},"Specification",[257,258,259,268,276],"tbody",{},[247,260,261,265],{},[262,263,264],"td",{},"HBM (High Bandwidth Memory)",[262,266,267],{},"Up to 32 GB with 1200 GB\u002Fs bandwidth per TPU v4 chip",[247,269,270,273],{},[262,271,272],{},"On-chip SRAM",[262,274,275],{},"144 MB of ultra-fast scratchpad memory",[247,277,278,281],{},[262,279,280],{},"Communication links",[262,282,283],{},"Dedicated high-speed links for all-reduce operations",[19,285,287],{"id":286},"limitations-of-traditional-accelerators","Limitations of Traditional Accelerators",[197,289,290,296,302,308],{},[200,291,292,295],{},[34,293,294],{},"Memory bandwidth bound"," — When fetching data takes longer than computing, the processor sits idle. GPUs often achieve only 30–50% of peak compute utilization.",[200,297,298,301],{},[34,299,300],{},"Kernel launch overhead"," — Each CUDA kernel launch has ~5–10 microseconds of overhead.",[200,303,304,307],{},[34,305,306],{},"Limited on-chip memory"," — GPU shared memory (~100KB per SM) requires frequent off-chip accesses for large models.",[200,309,310,313],{},[34,311,312],{},"Communication bottlenecks"," — All-reduce operations for gradient synchronization become the bottleneck as cluster size grows.",[12,315,316],{},"TPUs address these by co-designing hardware and software — XLA fuses operations, minimizes memory transfers, and optimizes the entire compute graph for the MXU architecture.",[19,318,320],{"id":319},"xla-the-compiler-that-unlocks-tpu-performance","XLA: The Compiler That Unlocks TPU Performance",[12,322,323,330],{},[34,324,325],{},[51,326,329],{"href":327,"rel":328},"https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FXLA_(software)",[55],"XLA (Accelerated Linear Algebra)"," takes your TensorFlow\u002FPyTorch\u002FJAX code and:",[332,333,334,340,346,352],"ol",{},[200,335,336,339],{},[34,337,338],{},"Fuses operations"," — Combines multiple ops into single kernels",[200,341,342,345],{},[34,343,344],{},"Eliminates intermediate allocations"," — Reduces memory traffic",[200,347,348,351],{},[34,349,350],{},"Optimizes layouts"," — Arranges tensors for efficient MXU access",[200,353,354,357],{},[34,355,356],{},"Automatic parallelism"," — Distributes computation across TPU cores",[91,359,361],{"className":106,"code":360,"language":108,"meta":100,"style":100},"import tensorflow as tf\n\n@tf.function(jit_compile=True)\ndef dense_layer(x, w, b):\n    \"\"\"XLA will fuse matmul + add + activation into a single kernel\"\"\"\n    return tf.nn.gelu(tf.matmul(x, w) + b)\n",[98,362,363,368,372,377,382,387],{"__ignoreMap":100},[112,364,365],{"class":114,"line":115},[112,366,367],{},"import tensorflow as tf\n",[112,369,370],{"class":114,"line":121},[112,371,131],{"emptyLinePlaceholder":130},[112,373,374],{"class":114,"line":127},[112,375,376],{},"@tf.function(jit_compile=True)\n",[112,378,379],{"class":114,"line":134},[112,380,381],{},"def dense_layer(x, w, b):\n",[112,383,384],{"class":114,"line":140},[112,385,386],{},"    \"\"\"XLA will fuse matmul + add + activation into a single kernel\"\"\"\n",[112,388,389],{"class":114,"line":146},[112,390,391],{},"    return tf.nn.gelu(tf.matmul(x, w) + b)\n",[12,393,394],{},"On TPU, this runs as one optimized operation instead of three separate kernels.",[19,396,398],{"id":397},"tpu-pods-scaling-beyond-single-chips","TPU Pods: Scaling Beyond Single Chips",[12,400,401,402,405],{},"TPU v4 pods contain up to ",[34,403,404],{},"4,096 chips"," interconnected with high-speed optical switches, enabling:",[197,407,408,414,420],{},[200,409,410,413],{},[34,411,412],{},"Model parallelism"," — Splitting giant models across chips",[200,415,416,419],{},[34,417,418],{},"Data parallelism"," — Processing different batches on different chips",[200,421,422,425],{},[34,423,424],{},"Pipeline parallelism"," — Staggering computation across layers",[12,427,428],{},"The all-reduce bandwidth of 300 GB\u002Fs per chip means gradient synchronization stays efficient even with thousands of chips.",[19,430,432],{"id":431},"key-optimizations-for-tpu-training","Key Optimizations for TPU Training",[434,435,437],"h3",{"id":436},"batch-size","Batch Size",[12,439,440],{},"On TPU, batch size should be a multiple of 8 (TPU v2\u002Fv3) or 4 (TPU v4) times the number of TPU cores:",[91,442,445],{"className":443,"code":444,"language":96},[94],"Optimal Batch Size = k × num_cores × MXU_width\n",[98,446,444],{"__ignoreMap":100},[434,448,450],{"id":449},"use-bfloat16","Use bfloat16",[12,452,453,454,457],{},"TPUs natively support ",[34,455,456],{},"bfloat16",", maintaining the dynamic range of float32 while using half the memory:",[91,459,461],{"className":106,"code":460,"language":108,"meta":100,"style":100},"tf.keras.mixed_precision.set_global_policy('mixed_bfloat16')\n",[98,462,463],{"__ignoreMap":100},[112,464,465],{"class":114,"line":115},[112,466,460],{},[19,468,470],{"id":469},"tpu-generations","TPU Generations",[241,472,473,489],{},[244,474,475],{},[247,476,477,480,483,486],{},[250,478,479],{},"Generation",[250,481,482],{},"Year",[250,484,485],{},"Peak Performance (BF16)",[250,487,488],{},"Memory",[257,490,491,505,519,533,545,558,571,584],{},[247,492,493,496,499,502],{},[262,494,495],{},"TPU v1",[262,497,498],{},"2016",[262,500,501],{},"92 TOPS",[262,503,504],{},"8 GB HBM",[247,506,507,510,513,516],{},[262,508,509],{},"TPU v2",[262,511,512],{},"2017",[262,514,515],{},"180 TFLOPS",[262,517,518],{},"16 GB HBM",[247,520,521,524,527,530],{},[262,522,523],{},"TPU v3",[262,525,526],{},"2018",[262,528,529],{},"420 TFLOPS",[262,531,532],{},"32 GB HBM",[247,534,535,537,540,543],{},[262,536,84],{},[262,538,539],{},"2021",[262,541,542],{},"275 TFLOPS\u002Fchip",[262,544,532],{},[247,546,547,550,553,556],{},[262,548,549],{},"TPU v5e",[262,551,552],{},"2023",[262,554,555],{},"197 TFLOPS",[262,557,518],{},[247,559,560,563,565,568],{},[262,561,562],{},"TPU v5p",[262,564,552],{},[262,566,567],{},"459 TFLOPS",[262,569,570],{},"95 GB HBM",[247,572,573,576,579,582],{},[262,574,575],{},"TPU v6e (Trillium)",[262,577,578],{},"2024",[262,580,581],{},"918 TFLOPS\u002Fchip",[262,583,532],{},[247,585,586,589,592,595],{},[262,587,588],{},"TPU v7x (Ironwood)",[262,590,591],{},"2025",[262,593,594],{},"2,307 TFLOPS\u002Fchip",[262,596,597],{},"192 GB HBM",[19,599,601],{"id":600},"references","References",[197,603,604,619,630,640],{},[200,605,606,609,610,613,614],{},[34,607,608],{},"Jouppi, N. P., et al."," (2017). ",[219,611,612],{},"In-Datacenter Performance Analysis of a Tensor Processing Unit",". ",[51,615,618],{"href":616,"rel":617},"https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.04760",[55],"arXiv:1704.04760",[200,620,621,624,625],{},[34,622,623],{},"Google Cloud."," ",[51,626,629],{"href":627,"rel":628},"https:\u002F\u002Fcloud.google.com\u002Ftpu\u002Fdocs",[55],"TPU Documentation",[200,631,632,624,635],{},[34,633,634],{},"TensorFlow.",[51,636,639],{"href":637,"rel":638},"https:\u002F\u002Fwww.tensorflow.org\u002Fguide\u002Ftpu",[55],"TPU Strategy Guide",[200,641,642,624,645],{},[34,643,644],{},"JAX Documentation.",[51,646,649],{"href":647,"rel":648},"https:\u002F\u002Fjax.readthedocs.io\u002Fen\u002Flatest\u002Fjax-101\u002F08-pjit.html",[55],"Using JAX on TPU",[651,652,653],"style",{},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}",{"title":100,"searchDepth":121,"depth":121,"links":655},[656,657,658,659,660,661,662,663,664,668,669],{"id":21,"depth":121,"text":22},{"id":28,"depth":121,"text":29},{"id":63,"depth":121,"text":64},{"id":188,"depth":121,"text":189},{"id":231,"depth":121,"text":232},{"id":286,"depth":121,"text":287},{"id":319,"depth":121,"text":320},{"id":397,"depth":121,"text":398},{"id":431,"depth":121,"text":432,"children":665},[666,667],{"id":436,"depth":127,"text":437},{"id":449,"depth":127,"text":450},{"id":469,"depth":121,"text":470},{"id":600,"depth":121,"text":601},[671,672,673,674],"low-level","tpu","machine-learning","performance","2026-01-28","Exploring TPUs and why they're so effective for large-scale AI training.","md",false,{},"\u002Fblog\u002Ftpu-silicon-engine-ai-training",{"title":6,"description":676},"blog\u002Ftpu-silicon-engine-ai-training",null,"Ca8_XstfJcVnLJSHdm-vDd9v26FIZfbO-mq1_raMDhU",{"id":686,"title":687,"author":7,"body":688,"categories":5128,"date":5131,"description":5132,"extension":677,"hidden":678,"meta":5133,"navigation":130,"path":5134,"seo":5135,"stem":5136,"thumbnail":683,"__hash__":5137},"blog\u002Fblog\u002Fneural-audio-codec-rvq.md","Vector Quantization: The Mathematical Art of Audio Compression",{"type":9,"value":689,"toc":5113},[690,695,698,702,705,708,714,719,723,732,738,743,750,753,755,759,762,765,768,1062,1065,1174,1177,1181,1184,1276,1279,1284,1288,1622,1962,1965,2196,2200,2556,2558,2563,2567,2573,2578,2581,2584,3108,3111,3662,3665,3954,3957,4030,4034,4039,4046,4051,4056,4061,4066,4071,4076,4081,4086,4090,4096,4101,4104,4107,4113,4118,4121,4227,4289,4293,4296,4658,4660,5004,5038,5040,5042,5111],[12,691,692],{},[219,693,694],{},"Every sound is data of millions of samples every second. Compressing all that without losing clarity has always been the challenge. Now imagine if a model could learn what truly matters in those waves and ignore the rest. That idea, called vector quantization, reshaped how modern AI handles voice and music.",[696,697],"hr",{},[19,699,701],{"id":700},"the-challenge-behind-modern-audio-compression","The Challenge Behind Modern Audio Compression",[12,703,704],{},"Modern voice AI systems face a big challenge: every second of CD-quality audio produces about 1.4 million data points. Multiply that by millions of users, and storage and transmission quickly become expensive. Earlier compression techniques such as MP3, AAC, and Opus helped, but each involved trade offs reducing bandwidth at the cost of quality or latency.",[12,706,707],{},"A simpler idea was treating sounds as a continuous stream of data points. But what if we could represent these sounds more efficiently?",[12,709,710],{},[212,711],{"alt":712,"src":713},"Continuous vs. Discrete Signal Representation","\u002Fcontinousvsdigital.png",[12,715,716],{},[219,717,718],{},"Figure 1: Continuous vs. Discrete Signal Representation",[19,720,722],{"id":721},"understanding-quantization","Understanding Quantization",[12,724,725,726,731],{},"Before we jump into how audio uses quantization, it helps to understand what quantization means in machine learning. In Machine learning, ",[51,727,730],{"href":728,"rel":729},"https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FQuantization",[55],"quantization"," refers to reducing the precision of numbers used to represent model parameters or activations like converting 32-bit floating points to 8-bit integers.",[12,733,734],{},[212,735],{"alt":736,"src":737},"32-bit Float to 8-bit Integer Quantization","\u002F8bitint-quantization.png",[12,739,740],{},[219,741,742],{},"Figure 2: 32-bit Float to 8-bit Integer Quantization",[42,744,745],{},[12,746,747,749],{},[34,748,48],{}," Quantization is different from compression. Compression reduces the size of data by encoding it more efficiently, while quantization reduces the precision of data representation.",[12,751,752],{},"This makes models faster and lighter, but it is a win win game if we are constrained on a limited amount of compute by sacrificing a small amount of accuracy for big efficiency gains. While this approach works well for neural networks, audio data has unique properties that require a more sophisticated strategy—one that can capture the complex patterns hidden in sound waves.",[696,754],{},[19,756,758],{"id":757},"vector-quantization-through-speech","Vector Quantization Through Speech",[12,760,761],{},"Enter vector quantization, a technique that transforms high-dimensional audio data into compact representations without significant loss of quality. Vector quantization (VQ) exploits the fact that variation in natural data is redundant. When you hear someone say \"hello\", your brain doesn't process every microscopic detail of the sound wave. Instead, it extracts key features and matches them against learned patterns.",[12,763,764],{},"Let's break down how VQ works mathematically.",[12,766,767],{},"Given input vector x ∈ ℝⁿ, find codebook entry cᵢ that minimizes:",[12,769,770],{},[112,771,774,852],{"className":772},[773],"katex",[112,775,778],{"className":776},[777],"katex-mathml",[779,780,782],"math",{"xmlns":781},"http:\u002F\u002Fwww.w3.org\u002F1998\u002FMath\u002FMathML",[783,784,785,847],"semantics",{},[786,787,788,792,797,800,804,813,816,819,823,825,827,830,836,838],"mrow",{},[789,790,791],"mi",{},"d",[793,794,796],"mo",{"stretchy":795},"false","(",[789,798,799],{},"x",[793,801,803],{"separator":802},"true",",",[805,806,807,810],"msub",{},[789,808,809],{},"c",[789,811,812],{},"i",[793,814,815],{"stretchy":795},")",[793,817,818],{},"=",[789,820,822],{"mathvariant":821},"normal","∣",[789,824,822],{"mathvariant":821},[789,826,799],{},[793,828,829],{},"−",[805,831,832,834],{},[789,833,809],{},[789,835,812],{},[789,837,822],{"mathvariant":821},[839,840,841,843],"msup",{},[789,842,822],{"mathvariant":821},[844,845,846],"mn",{},"2",[848,849,851],"annotation",{"encoding":850},"application\u002Fx-tex","d(x, c_i) = ||x - c_i||^2",[112,853,856,957,981],{"className":854,"ariaHidden":802},[855],"katex-html",[112,857,860,865,870,874,877,881,886,942,946,950,954],{"className":858},[859],"base",[112,861],{"className":862,"style":864},[863],"strut","height:1em;vertical-align:-0.25em;",[112,866,791],{"className":867},[868,869],"mord","mathnormal",[112,871,796],{"className":872},[873],"mopen",[112,875,799],{"className":876},[868,869],[112,878,803],{"className":879},[880],"mpunct",[112,882],{"className":883,"style":885},[884],"mspace","margin-right:0.1667em;",[112,887,889,892],{"className":888},[868],[112,890,809],{"className":891},[868,869],[112,893,896],{"className":894},[895],"msupsub",[112,897,901,933],{"className":898},[899,900],"vlist-t","vlist-t2",[112,902,905,928],{"className":903},[904],"vlist-r",[112,906,910],{"className":907,"style":909},[908],"vlist","height:0.3117em;",[112,911,913,918],{"style":912},"top:-2.55em;margin-left:0em;margin-right:0.05em;",[112,914],{"className":915,"style":917},[916],"pstrut","height:2.7em;",[112,919,925],{"className":920},[921,922,923,924],"sizing","reset-size6","size3","mtight",[112,926,812],{"className":927},[868,869,924],[112,929,932],{"className":930},[931],"vlist-s","​",[112,934,936],{"className":935},[904],[112,937,940],{"className":938,"style":939},[908],"height:0.15em;",[112,941],{},[112,943,815],{"className":944},[945],"mclose",[112,947],{"className":948,"style":949},[884],"margin-right:0.2778em;",[112,951,818],{"className":952},[953],"mrel",[112,955],{"className":956,"style":949},[884],[112,958,960,963,967,970,974,978],{"className":959},[859],[112,961],{"className":962,"style":864},[863],[112,964,966],{"className":965},[868],"∣∣",[112,968,799],{"className":969},[868,869],[112,971],{"className":972,"style":973},[884],"margin-right:0.2222em;",[112,975,829],{"className":976},[977],"mbin",[112,979],{"className":980,"style":973},[884],[112,982,984,988,1028,1031],{"className":983},[859],[112,985],{"className":986,"style":987},[863],"height:1.0641em;vertical-align:-0.25em;",[112,989,991,994],{"className":990},[868],[112,992,809],{"className":993},[868,869],[112,995,997],{"className":996},[895],[112,998,1000,1020],{"className":999},[899,900],[112,1001,1003,1017],{"className":1002},[904],[112,1004,1006],{"className":1005,"style":909},[908],[112,1007,1008,1011],{"style":912},[112,1009],{"className":1010,"style":917},[916],[112,1012,1014],{"className":1013},[921,922,923,924],[112,1015,812],{"className":1016},[868,869,924],[112,1018,932],{"className":1019},[931],[112,1021,1023],{"className":1022},[904],[112,1024,1026],{"className":1025,"style":939},[908],[112,1027],{},[112,1029,822],{"className":1030},[868],[112,1032,1034,1037],{"className":1033},[868],[112,1035,822],{"className":1036},[868],[112,1038,1040],{"className":1039},[895],[112,1041,1043],{"className":1042},[899],[112,1044,1046],{"className":1045},[904],[112,1047,1050],{"className":1048,"style":1049},[908],"height:0.8141em;",[112,1051,1053,1056],{"style":1052},"top:-3.063em;margin-right:0.05em;",[112,1054],{"className":1055,"style":917},[916],[112,1057,1059],{"className":1058},[921,922,923,924],[112,1060,846],{"className":1061},[868,924],[12,1063,1064],{},"Let's say we have a 256-dimensional vector representing a short audio segment. Instead of storing all 256 values, we can use VQ to find the closest match from a learned codebook of common speech patterns.",[91,1066,1068],{"className":106,"code":1067,"language":108,"meta":100,"style":100},"import numpy as np\n\n# Audio segment encoded as 256-dimensional vector\naudio_vector = np.array([0.23, -0.41, 0.67, -0.12, ...])  # 256 values\n\n# Learned codebook representing common speech patterns\ncodebook = np.array([\n    [0.25, -0.40, 0.65, -0.10, ...],  # maybe a fricative sound\n    [0.15, 0.32, -0.21, 0.45, ...],   # maybe a vowel sound\n    [0.67, -0.23, 0.12, 0.89, ...],   # maybe a plosive sound\n])\n\ndef quantize_vector(input_vec, codebook):\n    \"\"\"Find closest codebook match using L2 distance\"\"\"\n    distances = np.linalg.norm(codebook - input_vec, axis=1)\n    best_index = np.argmin(distances)\n    return codebook[best_index], best_index\n\nquantized_vec, index = quantize_vector(audio_vector, codebook)\n# Store index (small integer) instead of 256 floats\n",[98,1069,1070,1075,1079,1084,1089,1093,1098,1103,1108,1113,1118,1123,1127,1133,1139,1145,1151,1157,1162,1168],{"__ignoreMap":100},[112,1071,1072],{"class":114,"line":115},[112,1073,1074],{},"import numpy as np\n",[112,1076,1077],{"class":114,"line":121},[112,1078,131],{"emptyLinePlaceholder":130},[112,1080,1081],{"class":114,"line":127},[112,1082,1083],{},"# Audio segment encoded as 256-dimensional vector\n",[112,1085,1086],{"class":114,"line":134},[112,1087,1088],{},"audio_vector = np.array([0.23, -0.41, 0.67, -0.12, ...])  # 256 values\n",[112,1090,1091],{"class":114,"line":140},[112,1092,131],{"emptyLinePlaceholder":130},[112,1094,1095],{"class":114,"line":146},[112,1096,1097],{},"# Learned codebook representing common speech patterns\n",[112,1099,1100],{"class":114,"line":151},[112,1101,1102],{},"codebook = np.array([\n",[112,1104,1105],{"class":114,"line":157},[112,1106,1107],{},"    [0.25, -0.40, 0.65, -0.10, ...],  # maybe a fricative sound\n",[112,1109,1110],{"class":114,"line":162},[112,1111,1112],{},"    [0.15, 0.32, -0.21, 0.45, ...],   # maybe a vowel sound\n",[112,1114,1115],{"class":114,"line":168},[112,1116,1117],{},"    [0.67, -0.23, 0.12, 0.89, ...],   # maybe a plosive sound\n",[112,1119,1120],{"class":114,"line":174},[112,1121,1122],{},"])\n",[112,1124,1125],{"class":114,"line":179},[112,1126,131],{"emptyLinePlaceholder":130},[112,1128,1130],{"class":114,"line":1129},13,[112,1131,1132],{},"def quantize_vector(input_vec, codebook):\n",[112,1134,1136],{"class":114,"line":1135},14,[112,1137,1138],{},"    \"\"\"Find closest codebook match using L2 distance\"\"\"\n",[112,1140,1142],{"class":114,"line":1141},15,[112,1143,1144],{},"    distances = np.linalg.norm(codebook - input_vec, axis=1)\n",[112,1146,1148],{"class":114,"line":1147},16,[112,1149,1150],{},"    best_index = np.argmin(distances)\n",[112,1152,1154],{"class":114,"line":1153},17,[112,1155,1156],{},"    return codebook[best_index], best_index\n",[112,1158,1160],{"class":114,"line":1159},18,[112,1161,131],{"emptyLinePlaceholder":130},[112,1163,1165],{"class":114,"line":1164},19,[112,1166,1167],{},"quantized_vec, index = quantize_vector(audio_vector, codebook)\n",[112,1169,1171],{"class":114,"line":1170},20,[112,1172,1173],{},"# Store index (small integer) instead of 256 floats\n",[12,1175,1176],{},"By storing just the index of the closest codebook entry, we drastically reduce the amount of data needed to represent the audio segment.",[19,1178,1180],{"id":1179},"how-the-codebook-is-learned","How the Codebook is Learned",[12,1182,1183],{},"Traditional approaches used k-means clustering to discover representative patterns:",[91,1185,1187],{"className":106,"code":1186,"language":108,"meta":100,"style":100},"def learn_codebook_kmeans(training_data, k=1024):\n    # Initialize random centroids\n    centroids = np.random.randn(k, vector_dim)\n\n    for iteration in range(max_iters):\n        # Assign each vector to nearest centroid\n        assignments = []\n        for vec in training_data:\n            distances = np.linalg.norm(centroids - vec, axis=1)\n            assignments.append(np.argmin(distances))\n\n        # Update centroids as cluster means\n        for i in range(k):\n            cluster_vecs = training_data[np.array(assignments) == i]\n            if len(cluster_vecs) > 0:\n                centroids[i] = np.mean(cluster_vecs, axis=0)\n\n    return centroids\n",[98,1188,1189,1194,1199,1204,1208,1213,1218,1223,1228,1233,1238,1242,1247,1252,1257,1262,1267,1271],{"__ignoreMap":100},[112,1190,1191],{"class":114,"line":115},[112,1192,1193],{},"def learn_codebook_kmeans(training_data, k=1024):\n",[112,1195,1196],{"class":114,"line":121},[112,1197,1198],{},"    # Initialize random centroids\n",[112,1200,1201],{"class":114,"line":127},[112,1202,1203],{},"    centroids = np.random.randn(k, vector_dim)\n",[112,1205,1206],{"class":114,"line":134},[112,1207,131],{"emptyLinePlaceholder":130},[112,1209,1210],{"class":114,"line":140},[112,1211,1212],{},"    for iteration in range(max_iters):\n",[112,1214,1215],{"class":114,"line":146},[112,1216,1217],{},"        # Assign each vector to nearest centroid\n",[112,1219,1220],{"class":114,"line":151},[112,1221,1222],{},"        assignments = []\n",[112,1224,1225],{"class":114,"line":157},[112,1226,1227],{},"        for vec in training_data:\n",[112,1229,1230],{"class":114,"line":162},[112,1231,1232],{},"            distances = np.linalg.norm(centroids - vec, axis=1)\n",[112,1234,1235],{"class":114,"line":168},[112,1236,1237],{},"            assignments.append(np.argmin(distances))\n",[112,1239,1240],{"class":114,"line":174},[112,1241,131],{"emptyLinePlaceholder":130},[112,1243,1244],{"class":114,"line":179},[112,1245,1246],{},"        # Update centroids as cluster means\n",[112,1248,1249],{"class":114,"line":1129},[112,1250,1251],{},"        for i in range(k):\n",[112,1253,1254],{"class":114,"line":1135},[112,1255,1256],{},"            cluster_vecs = training_data[np.array(assignments) == i]\n",[112,1258,1259],{"class":114,"line":1141},[112,1260,1261],{},"            if len(cluster_vecs) > 0:\n",[112,1263,1264],{"class":114,"line":1147},[112,1265,1266],{},"                centroids[i] = np.mean(cluster_vecs, axis=0)\n",[112,1268,1269],{"class":114,"line":1153},[112,1270,131],{"emptyLinePlaceholder":130},[112,1272,1273],{"class":114,"line":1159},[112,1274,1275],{},"    return centroids\n",[12,1277,1278],{},"Newer codecs use more sophisticated methods like VQ-VAE to jointly learn the codebook and the encoder-decoder architecture.",[12,1280,1281],{},[219,1282,1283],{},"This worked for offline processing but had serious limitations for neural network training. The discrete assignment steps and batch processing requirements made gradient-based optimization difficult.",[19,1285,1287],{"id":1286},"limitations-of-traditional-vector-quantization","Limitations of Traditional Vector Quantization",[12,1289,1290,1291,1380,1381,1621],{},"Let the input be a feature vector ",[112,1292,1294,1320],{"className":1293},[773],[112,1295,1297],{"className":1296},[777],[779,1298,1299],{"xmlns":781},[783,1300,1301,1317],{},[786,1302,1303,1306,1309],{},[789,1304,799],{"mathvariant":1305},"bold",[793,1307,1308],{},"∈",[839,1310,1311,1315],{},[789,1312,1314],{"mathvariant":1313},"double-struck","R",[789,1316,791],{},[848,1318,1319],{"encoding":850},"\\mathbf{x} \\in \\mathbb{R}^d",[112,1321,1323,1343],{"className":1322,"ariaHidden":802},[855],[112,1324,1326,1330,1334,1337,1340],{"className":1325},[859],[112,1327],{"className":1328,"style":1329},[863],"height:0.5782em;vertical-align:-0.0391em;",[112,1331,799],{"className":1332},[868,1333],"mathbf",[112,1335],{"className":1336,"style":949},[884],[112,1338,1308],{"className":1339},[953],[112,1341],{"className":1342,"style":949},[884],[112,1344,1346,1350],{"className":1345},[859],[112,1347],{"className":1348,"style":1349},[863],"height:0.8491em;",[112,1351,1353,1357],{"className":1352},[868],[112,1354,1314],{"className":1355},[868,1356],"mathbb",[112,1358,1360],{"className":1359},[895],[112,1361,1363],{"className":1362},[899],[112,1364,1366],{"className":1365},[904],[112,1367,1369],{"className":1368,"style":1349},[908],[112,1370,1371,1374],{"style":1052},[112,1372],{"className":1373,"style":917},[916],[112,1375,1377],{"className":1376},[921,922,923,924],[112,1378,791],{"className":1379},[868,869,924]," and a finite codebook ",[112,1382,1384,1437],{"className":1383},[773],[112,1385,1387],{"className":1386},[777],[779,1388,1389],{"xmlns":781},[783,1390,1391,1434],{},[786,1392,1393,1397,1399,1402,1409,1411,1417,1419,1422,1424,1431],{},[789,1394,1396],{"mathvariant":1395},"script","C",[793,1398,818],{},[793,1400,1401],{"stretchy":795},"{",[805,1403,1404,1406],{},[789,1405,809],{"mathvariant":1305},[844,1407,1408],{},"1",[793,1410,803],{"separator":802},[805,1412,1413,1415],{},[789,1414,809],{"mathvariant":1305},[844,1416,846],{},[793,1418,803],{"separator":802},[793,1420,1421],{},"…",[793,1423,803],{"separator":802},[805,1425,1426,1428],{},[789,1427,809],{"mathvariant":1305},[789,1429,1430],{},"K",[793,1432,1433],{"stretchy":795},"}",[848,1435,1436],{"encoding":850},"\\mathcal{C} = \\lbrace \\mathbf{c}_1, \\mathbf{c}_2, \\dots, \\mathbf{c}_K \\rbrace",[112,1438,1440,1461],{"className":1439,"ariaHidden":802},[855],[112,1441,1443,1447,1452,1455,1458],{"className":1442},[859],[112,1444],{"className":1445,"style":1446},[863],"height:0.6833em;",[112,1448,1396],{"className":1449,"style":1451},[868,1450],"mathcal","margin-right:0.0583em;",[112,1453],{"className":1454,"style":949},[884],[112,1456,818],{"className":1457},[953],[112,1459],{"className":1460,"style":949},[884],[112,1462,1464,1467,1470,1511,1514,1517,1557,1560,1563,1567,1570,1573,1576,1618],{"className":1463},[859],[112,1465],{"className":1466,"style":864},[863],[112,1468,1401],{"className":1469},[873],[112,1471,1473,1476],{"className":1472},[868],[112,1474,809],{"className":1475},[868,1333],[112,1477,1479],{"className":1478},[895],[112,1480,1482,1503],{"className":1481},[899,900],[112,1483,1485,1500],{"className":1484},[904],[112,1486,1489],{"className":1487,"style":1488},[908],"height:0.3011em;",[112,1490,1491,1494],{"style":912},[112,1492],{"className":1493,"style":917},[916],[112,1495,1497],{"className":1496},[921,922,923,924],[112,1498,1408],{"className":1499},[868,924],[112,1501,932],{"className":1502},[931],[112,1504,1506],{"className":1505},[904],[112,1507,1509],{"className":1508,"style":939},[908],[112,1510],{},[112,1512,803],{"className":1513},[880],[112,1515],{"className":1516,"style":885},[884],[112,1518,1520,1523],{"className":1519},[868],[112,1521,809],{"className":1522},[868,1333],[112,1524,1526],{"className":1525},[895],[112,1527,1529,1549],{"className":1528},[899,900],[112,1530,1532,1546],{"className":1531},[904],[112,1533,1535],{"className":1534,"style":1488},[908],[112,1536,1537,1540],{"style":912},[112,1538],{"className":1539,"style":917},[916],[112,1541,1543],{"className":1542},[921,922,923,924],[112,1544,846],{"className":1545},[868,924],[112,1547,932],{"className":1548},[931],[112,1550,1552],{"className":1551},[904],[112,1553,1555],{"className":1554,"style":939},[908],[112,1556],{},[112,1558,803],{"className":1559},[880],[112,1561],{"className":1562,"style":885},[884],[112,1564,1421],{"className":1565},[1566],"minner",[112,1568],{"className":1569,"style":885},[884],[112,1571,803],{"className":1572},[880],[112,1574],{"className":1575,"style":885},[884],[112,1577,1579,1582],{"className":1578},[868],[112,1580,809],{"className":1581},[868,1333],[112,1583,1585],{"className":1584},[895],[112,1586,1588,1610],{"className":1587},[899,900],[112,1589,1591,1607],{"className":1590},[904],[112,1592,1595],{"className":1593,"style":1594},[908],"height:0.3283em;",[112,1596,1597,1600],{"style":912},[112,1598],{"className":1599,"style":917},[916],[112,1601,1603],{"className":1602},[921,922,923,924],[112,1604,1430],{"className":1605,"style":1606},[868,869,924],"margin-right:0.0715em;",[112,1608,932],{"className":1609},[931],[112,1611,1613],{"className":1612},[904],[112,1614,1616],{"className":1615,"style":939},[908],[112,1617],{},[112,1619,1433],{"className":1620},[945],".\nVector quantization replaces each input with its nearest codeword:",[12,1623,1624],{},[112,1625,1627,1700],{"className":1626},[773],[112,1628,1630],{"className":1629},[777],[779,1631,1632],{"xmlns":781},[783,1633,1634,1697],{},[786,1635,1636,1639,1641,1643,1645,1647,1650,1653,1675,1678,1680,1682,1688],{},[789,1637,1638],{},"Q",[793,1640,796],{"stretchy":795},[789,1642,799],{"mathvariant":1305},[793,1644,815],{"stretchy":795},[793,1646,818],{},[789,1648,1649],{},"arg",[793,1651,1652],{},"⁡",[805,1654,1655,1662],{},[786,1656,1657,1660],{},[789,1658,1659],{},"min",[793,1661,1652],{},[786,1663,1664,1671,1673],{},[805,1665,1666,1668],{},[789,1667,809],{"mathvariant":1305},[789,1669,1670],{},"k",[793,1672,1308],{},[789,1674,1396],{"mathvariant":1395},[789,1676,1677],{"mathvariant":821},"∥",[789,1679,799],{"mathvariant":1305},[793,1681,829],{},[805,1683,1684,1686],{},[789,1685,809],{"mathvariant":1305},[789,1687,1670],{},[1689,1690,1691,1693,1695],"msubsup",{},[789,1692,1677],{"mathvariant":821},[844,1694,846],{},[844,1696,846],{},[848,1698,1699],{"encoding":850},"Q(\\mathbf{x}) = \\arg\\min_{\\mathbf{c}_k \\in \\mathcal{C}} \\|\\mathbf{x} - \\mathbf{c}_k\\|_2^2",[112,1701,1703,1730,1862],{"className":1702,"ariaHidden":802},[855],[112,1704,1706,1709,1712,1715,1718,1721,1724,1727],{"className":1705},[859],[112,1707],{"className":1708,"style":864},[863],[112,1710,1638],{"className":1711},[868,869],[112,1713,796],{"className":1714},[873],[112,1716,799],{"className":1717},[868,1333],[112,1719,815],{"className":1720},[945],[112,1722],{"className":1723,"style":949},[884],[112,1725,818],{"className":1726},[953],[112,1728],{"className":1729,"style":949},[884],[112,1731,1733,1737,1746,1749,1844,1847,1850,1853,1856,1859],{"className":1732},[859],[112,1734],{"className":1735,"style":1736},[863],"height:1.0059em;vertical-align:-0.2559em;",[112,1738,1741,1742],{"className":1739},[1740],"mop","ar",[112,1743,1745],{"style":1744},"margin-right:0.0139em;","g",[112,1747],{"className":1748,"style":885},[884],[112,1750,1752,1755],{"className":1751},[1740],[112,1753,1659],{"className":1754},[1740],[112,1756,1758],{"className":1757},[895],[112,1759,1761,1835],{"className":1760},[899,900],[112,1762,1764,1832],{"className":1763},[904],[112,1765,1767],{"className":1766,"style":1594},[908],[112,1768,1770,1773],{"style":1769},"top:-2.55em;margin-right:0.05em;",[112,1771],{"className":1772,"style":917},[916],[112,1774,1776],{"className":1775},[921,922,923,924],[112,1777,1779,1826,1829],{"className":1778},[868,924],[112,1780,1782,1785],{"className":1781},[868,924],[112,1783,809],{"className":1784},[868,1333,924],[112,1786,1788],{"className":1787},[895],[112,1789,1791,1817],{"className":1790},[899,900],[112,1792,1794,1814],{"className":1793},[904],[112,1795,1798],{"className":1796,"style":1797},[908],"height:0.3448em;",[112,1799,1801,1805],{"style":1800},"top:-2.3488em;margin-left:0em;margin-right:0.0714em;",[112,1802],{"className":1803,"style":1804},[916],"height:2.5em;",[112,1806,1810],{"className":1807},[921,1808,1809,924],"reset-size3","size1",[112,1811,1670],{"className":1812,"style":1813},[868,869,924],"margin-right:0.0315em;",[112,1815,932],{"className":1816},[931],[112,1818,1820],{"className":1819},[904],[112,1821,1824],{"className":1822,"style":1823},[908],"height:0.1512em;",[112,1825],{},[112,1827,1308],{"className":1828},[953,924],[112,1830,1396],{"className":1831,"style":1451},[868,1450,924],[112,1833,932],{"className":1834},[931],[112,1836,1838],{"className":1837},[904],[112,1839,1842],{"className":1840,"style":1841},[908],"height:0.2559em;",[112,1843],{},[112,1845],{"className":1846,"style":885},[884],[112,1848,1677],{"className":1849},[868],[112,1851,799],{"className":1852},[868,1333],[112,1854],{"className":1855,"style":973},[884],[112,1857,829],{"className":1858},[977],[112,1860],{"className":1861,"style":973},[884],[112,1863,1865,1868,1909],{"className":1864},[859],[112,1866],{"className":1867,"style":987},[863],[112,1869,1871,1874],{"className":1870},[868],[112,1872,809],{"className":1873},[868,1333],[112,1875,1877],{"className":1876},[895],[112,1878,1880,1901],{"className":1879},[899,900],[112,1881,1883,1898],{"className":1882},[904],[112,1884,1887],{"className":1885,"style":1886},[908],"height:0.3361em;",[112,1888,1889,1892],{"style":912},[112,1890],{"className":1891,"style":917},[916],[112,1893,1895],{"className":1894},[921,922,923,924],[112,1896,1670],{"className":1897,"style":1813},[868,869,924],[112,1899,932],{"className":1900},[931],[112,1902,1904],{"className":1903},[904],[112,1905,1907],{"className":1906,"style":939},[908],[112,1908],{},[112,1910,1912,1915],{"className":1911},[868],[112,1913,1677],{"className":1914},[868],[112,1916,1918],{"className":1917},[895],[112,1919,1921,1953],{"className":1920},[899,900],[112,1922,1924,1950],{"className":1923},[904],[112,1925,1927,1939],{"className":1926,"style":1049},[908],[112,1928,1930,1933],{"style":1929},"top:-2.4519em;margin-left:0em;margin-right:0.05em;",[112,1931],{"className":1932,"style":917},[916],[112,1934,1936],{"className":1935},[921,922,923,924],[112,1937,846],{"className":1938},[868,924],[112,1940,1941,1944],{"style":1052},[112,1942],{"className":1943,"style":917},[916],[112,1945,1947],{"className":1946},[921,922,923,924],[112,1948,846],{"className":1949},[868,924],[112,1951,932],{"className":1952},[931],[112,1954,1956],{"className":1955},[904],[112,1957,1960],{"className":1958,"style":1959},[908],"height:0.2481em;",[112,1961],{},[12,1963,1964],{},"The expected distortion (error) is:",[12,1966,1967],{},[112,1968,1970,2024],{"className":1969},[773],[112,1971,1973],{"className":1972},[777],[779,1974,1975],{"xmlns":781},[783,1976,1977,2021],{},[786,1978,1979,1982,1984,1991],{},[789,1980,1981],{},"D",[793,1983,818],{},[805,1985,1986,1989],{},[789,1987,1988],{"mathvariant":1313},"E",[789,1990,799],{"mathvariant":1305},[786,1992,1993,1996,1998,2000,2002,2004,2006,2008,2010,2018],{},[793,1994,1995],{"fence":802},"[",[789,1997,1677],{"mathvariant":821},[789,1999,799],{"mathvariant":1305},[793,2001,829],{},[789,2003,1638],{},[793,2005,796],{"stretchy":795},[789,2007,799],{"mathvariant":1305},[793,2009,815],{"stretchy":795},[1689,2011,2012,2014,2016],{},[789,2013,1677],{"mathvariant":821},[844,2015,846],{},[844,2017,846],{},[793,2019,2020],{"fence":802},"]",[848,2022,2023],{"encoding":850},"D = \\mathbb{E}_{\\mathbf{x}}\\left[\\|\\mathbf{x} - Q(\\mathbf{x})\\|_2^2\\right]",[112,2025,2027,2046],{"className":2026,"ariaHidden":802},[855],[112,2028,2030,2033,2037,2040,2043],{"className":2029},[859],[112,2031],{"className":2032,"style":1446},[863],[112,2034,1981],{"className":2035,"style":2036},[868,869],"margin-right:0.0278em;",[112,2038],{"className":2039,"style":949},[884],[112,2041,818],{"className":2042},[953],[112,2044],{"className":2045,"style":949},[884],[112,2047,2049,2053,2097,2100],{"className":2048},[859],[112,2050],{"className":2051,"style":2052},[863],"height:1.2em;vertical-align:-0.35em;",[112,2054,2056,2059],{"className":2055},[868],[112,2057,1988],{"className":2058},[868,1356],[112,2060,2062],{"className":2061},[895],[112,2063,2065,2089],{"className":2064},[899,900],[112,2066,2068,2086],{"className":2067},[904],[112,2069,2072],{"className":2070,"style":2071},[908],"height:0.1611em;",[112,2073,2074,2077],{"style":912},[112,2075],{"className":2076,"style":917},[916],[112,2078,2080],{"className":2079},[921,922,923,924],[112,2081,2083],{"className":2082},[868,924],[112,2084,799],{"className":2085},[868,1333,924],[112,2087,932],{"className":2088},[931],[112,2090,2092],{"className":2091},[904],[112,2093,2095],{"className":2094,"style":939},[908],[112,2096],{},[112,2098],{"className":2099,"style":885},[884],[112,2101,2103,2112,2115,2118,2121,2124,2127,2130,2133,2136,2139,2190],{"className":2102},[1566],[112,2104,2108],{"className":2105,"style":2107},[873,2106],"delimcenter","top:0em;",[112,2109,1995],{"className":2110},[2111,1809],"delimsizing",[112,2113,1677],{"className":2114},[868],[112,2116,799],{"className":2117},[868,1333],[112,2119],{"className":2120,"style":973},[884],[112,2122,829],{"className":2123},[977],[112,2125],{"className":2126,"style":973},[884],[112,2128,1638],{"className":2129},[868,869],[112,2131,796],{"className":2132},[873],[112,2134,799],{"className":2135},[868,1333],[112,2137,815],{"className":2138},[945],[112,2140,2142,2145],{"className":2141},[868],[112,2143,1677],{"className":2144},[868],[112,2146,2148],{"className":2147},[895],[112,2149,2151,2182],{"className":2150},[899,900],[112,2152,2154,2179],{"className":2153},[904],[112,2155,2157,2168],{"className":2156,"style":1049},[908],[112,2158,2159,2162],{"style":1929},[112,2160],{"className":2161,"style":917},[916],[112,2163,2165],{"className":2164},[921,922,923,924],[112,2166,846],{"className":2167},[868,924],[112,2169,2170,2173],{"style":1052},[112,2171],{"className":2172,"style":917},[916],[112,2174,2176],{"className":2175},[921,922,923,924],[112,2177,846],{"className":2178},[868,924],[112,2180,932],{"className":2181},[931],[112,2183,2185],{"className":2184},[904],[112,2186,2188],{"className":2187,"style":1959},[908],[112,2189],{},[112,2191,2193],{"className":2192,"style":2107},[945,2106],[112,2194,2020],{"className":2195},[2111,1809],[434,2197,2199],{"id":2198},"core-limitations","Core Limitations",[332,2201,2202,2253,2395,2452,2550],{},[200,2203,2204,2207,2208,2252],{},[34,2205,2206],{},"Limited expressiveness"," — A single codeword per region can't capture complex or multimodal distributions in ",[112,2209,2211,2231],{"className":2210},[773],[112,2212,2214],{"className":2213},[777],[779,2215,2216],{"xmlns":781},[783,2217,2218,2228],{},[786,2219,2220,2222,2224,2226],{},[789,2221,12],{},[793,2223,796],{"stretchy":795},[789,2225,799],{"mathvariant":1305},[793,2227,815],{"stretchy":795},[848,2229,2230],{"encoding":850},"p(\\mathbf{x})",[112,2232,2234],{"className":2233,"ariaHidden":802},[855],[112,2235,2237,2240,2243,2246,2249],{"className":2236},[859],[112,2238],{"className":2239,"style":864},[863],[112,2241,12],{"className":2242},[868,869],[112,2244,796],{"className":2245},[873],[112,2247,799],{"className":2248},[868,1333],[112,2250,815],{"className":2251},[945],".",[200,2254,2255,2258,2259,2262,2263,2365,2366,2394],{},[34,2256,2257],{},"Codebook growth problem"," — To halve distortion, you often need to ",[219,2260,2261],{},"square"," the number of codewords: ",[112,2264,2266,2298],{"className":2265},[773],[112,2267,2269],{"className":2268},[777],[779,2270,2271],{"xmlns":781},[783,2272,2273,2295],{},[786,2274,2275,2277,2280],{},[789,2276,1981],{},[793,2278,2279],{},"∝",[839,2281,2282,2284],{},[789,2283,1430],{},[786,2285,2286,2288,2290,2293],{},[793,2287,829],{},[844,2289,846],{},[789,2291,2292],{"mathvariant":821},"\u002F",[789,2294,791],{},[848,2296,2297],{"encoding":850},"D \\propto K^{-2\u002Fd}",[112,2299,2301,2319],{"className":2300,"ariaHidden":802},[855],[112,2302,2304,2307,2310,2313,2316],{"className":2303},[859],[112,2305],{"className":2306,"style":1446},[863],[112,2308,1981],{"className":2309,"style":2036},[868,869],[112,2311],{"className":2312,"style":949},[884],[112,2314,2279],{"className":2315},[953],[112,2317],{"className":2318,"style":949},[884],[112,2320,2322,2326],{"className":2321},[859],[112,2323],{"className":2324,"style":2325},[863],"height:0.888em;",[112,2327,2329,2332],{"className":2328},[868],[112,2330,1430],{"className":2331,"style":1606},[868,869],[112,2333,2335],{"className":2334},[895],[112,2336,2338],{"className":2337},[899],[112,2339,2341],{"className":2340},[904],[112,2342,2344],{"className":2343,"style":2325},[908],[112,2345,2346,2349],{"style":1052},[112,2347],{"className":2348,"style":917},[916],[112,2350,2352],{"className":2351},[921,922,923,924],[112,2353,2355,2358,2362],{"className":2354},[868,924],[112,2356,829],{"className":2357},[868,924],[112,2359,2361],{"className":2360},[868,924],"2\u002F",[112,2363,791],{"className":2364},[868,869,924],". Larger ",[112,2367,2369,2382],{"className":2368},[773],[112,2370,2372],{"className":2371},[777],[779,2373,2374],{"xmlns":781},[783,2375,2376,2380],{},[786,2377,2378],{},[789,2379,1430],{},[848,2381,1430],{"encoding":850},[112,2383,2385],{"className":2384,"ariaHidden":802},[855],[112,2386,2388,2391],{"className":2387},[859],[112,2389],{"className":2390,"style":1446},[863],[112,2392,1430],{"className":2393,"style":1606},[868,869]," implies exponential memory and compute.",[200,2396,2397,2400,2401,2451],{},[34,2398,2399],{},"High encoding cost"," — Nearest-neighbor search costs ",[112,2402,2404,2427],{"className":2403},[773],[112,2405,2407],{"className":2406},[777],[779,2408,2409],{"xmlns":781},[783,2410,2411,2424],{},[786,2412,2413,2416,2418,2420,2422],{},[789,2414,2415],{},"O",[793,2417,796],{"stretchy":795},[789,2419,1430],{},[789,2421,791],{},[793,2423,815],{"stretchy":795},[848,2425,2426],{"encoding":850},"O(Kd)",[112,2428,2430],{"className":2429,"ariaHidden":802},[855],[112,2431,2433,2436,2439,2442,2445,2448],{"className":2432},[859],[112,2434],{"className":2435,"style":864},[863],[112,2437,2415],{"className":2438,"style":2036},[868,869],[112,2440,796],{"className":2441},[873],[112,2443,1430],{"className":2444,"style":1606},[868,869],[112,2446,791],{"className":2447},[868,869],[112,2449,815],{"className":2450},[945]," for each vector.",[200,2453,2454,2457,2458,2549],{},[34,2455,2456],{},"No residual correction"," — Once quantized, the residual ",[112,2459,2461,2490],{"className":2460},[773],[112,2462,2464],{"className":2463},[777],[779,2465,2466],{"xmlns":781},[783,2467,2468,2487],{},[786,2469,2470,2473,2475,2477,2479,2481,2483,2485],{},[789,2471,2472],{"mathvariant":1305},"e",[793,2474,818],{},[789,2476,799],{"mathvariant":1305},[793,2478,829],{},[789,2480,1638],{},[793,2482,796],{"stretchy":795},[789,2484,799],{"mathvariant":1305},[793,2486,815],{"stretchy":795},[848,2488,2489],{"encoding":850},"\\mathbf{e} = \\mathbf{x} - Q(\\mathbf{x})",[112,2491,2493,2512,2531],{"className":2492,"ariaHidden":802},[855],[112,2494,2496,2500,2503,2506,2509],{"className":2495},[859],[112,2497],{"className":2498,"style":2499},[863],"height:0.4444em;",[112,2501,2472],{"className":2502},[868,1333],[112,2504],{"className":2505,"style":949},[884],[112,2507,818],{"className":2508},[953],[112,2510],{"className":2511,"style":949},[884],[112,2513,2515,2519,2522,2525,2528],{"className":2514},[859],[112,2516],{"className":2517,"style":2518},[863],"height:0.6667em;vertical-align:-0.0833em;",[112,2520,799],{"className":2521},[868,1333],[112,2523],{"className":2524,"style":973},[884],[112,2526,829],{"className":2527},[977],[112,2529],{"className":2530,"style":973},[884],[112,2532,2534,2537,2540,2543,2546],{"className":2533},[859],[112,2535],{"className":2536,"style":864},[863],[112,2538,1638],{"className":2539},[868,869],[112,2541,796],{"className":2542},[873],[112,2544,799],{"className":2545},[868,1333],[112,2547,815],{"className":2548},[945]," is discarded, wasting useful fine-grained detail.",[200,2551,2552,2555],{},[34,2553,2554],{},"Uniform distortion metric"," — Standard L2 distance treats all dimensions equally.",[696,2557],{},[42,2559,2560],{},[12,2561,2562],{},"Classic VQ minimizes distortion but scales poorly with dimension and distribution complexity — this motivates techniques like Residual Vector Quantization (RVQ) to address these limitations.",[19,2564,2566],{"id":2565},"residual-vector-quantization","Residual Vector Quantization",[12,2568,2569],{},[212,2570],{"alt":2571,"src":2572},"Residual Vector Quantization (RVQ) Architecture","https:\u002F\u002Fnotesbylex.com\u002F_media\u002Frvq.png",[12,2574,2575],{},[219,2576,2577],{},"Figure 3: Residual Vector Quantization (RVQ) Architecture",[12,2579,2580],{},"Residual Vector Quantization fundamentally changed the game by stacking multiple quantizers, where each stage learns to encode the error left behind by the previous one.",[12,2582,2583],{},"The mathematical formulation of RVQ is:",[12,2585,2586],{},[112,2587,2589,2695],{"className":2588},[773],[112,2590,2592],{"className":2591},[777],[779,2593,2594],{"xmlns":781},[783,2595,2596,2692],{},[786,2597,2598,2600,2603,2610,2612,2614,2616,2619,2625,2627,2629,2631,2637,2639,2641,2643,2645,2647,2654,2656,2658,2660,2666,2668,2670,2672,2674,2680,2682,2684,2686,2688,2690],{},[789,2599,799],{"mathvariant":1305},[793,2601,2602],{},"≈",[805,2604,2605,2608],{},[789,2606,2607],{},"q",[844,2609,1408],{},[793,2611,796],{"stretchy":795},[789,2613,799],{"mathvariant":1305},[793,2615,815],{"stretchy":795},[793,2617,2618],{},"+",[805,2620,2621,2623],{},[789,2622,2607],{},[844,2624,846],{},[793,2626,796],{"stretchy":795},[789,2628,799],{"mathvariant":1305},[793,2630,829],{},[805,2632,2633,2635],{},[789,2634,2607],{},[844,2636,1408],{},[793,2638,796],{"stretchy":795},[789,2640,799],{"mathvariant":1305},[793,2642,815],{"stretchy":795},[793,2644,815],{"stretchy":795},[793,2646,2618],{},[805,2648,2649,2651],{},[789,2650,2607],{},[844,2652,2653],{},"3",[793,2655,796],{"stretchy":795},[789,2657,799],{"mathvariant":1305},[793,2659,829],{},[805,2661,2662,2664],{},[789,2663,2607],{},[844,2665,1408],{},[793,2667,796],{"stretchy":795},[789,2669,799],{"mathvariant":1305},[793,2671,815],{"stretchy":795},[793,2673,829],{},[805,2675,2676,2678],{},[789,2677,2607],{},[844,2679,846],{},[793,2681,796],{"stretchy":795},[789,2683,799],{"mathvariant":1305},[793,2685,815],{"stretchy":795},[793,2687,815],{"stretchy":795},[793,2689,2618],{},[793,2691,1421],{},[848,2693,2694],{"encoding":850},"\\mathbf{x} \\approx q_1(\\mathbf{x}) + q_2(\\mathbf{x} - q_1(\\mathbf{x})) + q_3(\\mathbf{x} - q_1(\\mathbf{x}) - q_2(\\mathbf{x})) + \\ldots",[112,2696,2698,2717,2783,2844,2909,2970,3034,3098],{"className":2697,"ariaHidden":802},[855],[112,2699,2701,2705,2708,2711,2714],{"className":2700},[859],[112,2702],{"className":2703,"style":2704},[863],"height:0.4831em;",[112,2706,799],{"className":2707},[868,1333],[112,2709],{"className":2710,"style":949},[884],[112,2712,2602],{"className":2713},[953],[112,2715],{"className":2716,"style":949},[884],[112,2718,2720,2723,2765,2768,2771,2774,2777,2780],{"className":2719},[859],[112,2721],{"className":2722,"style":864},[863],[112,2724,2726,2730],{"className":2725},[868],[112,2727,2607],{"className":2728,"style":2729},[868,869],"margin-right:0.0359em;",[112,2731,2733],{"className":2732},[895],[112,2734,2736,2757],{"className":2735},[899,900],[112,2737,2739,2754],{"className":2738},[904],[112,2740,2742],{"className":2741,"style":1488},[908],[112,2743,2745,2748],{"style":2744},"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;",[112,2746],{"className":2747,"style":917},[916],[112,2749,2751],{"className":2750},[921,922,923,924],[112,2752,1408],{"className":2753},[868,924],[112,2755,932],{"className":2756},[931],[112,2758,2760],{"className":2759},[904],[112,2761,2763],{"className":2762,"style":939},[908],[112,2764],{},[112,2766,796],{"className":2767},[873],[112,2769,799],{"className":2770},[868,1333],[112,2772,815],{"className":2773},[945],[112,2775],{"className":2776,"style":973},[884],[112,2778,2618],{"className":2779},[977],[112,2781],{"className":2782,"style":973},[884],[112,2784,2786,2789,2829,2832,2835,2838,2841],{"className":2785},[859],[112,2787],{"className":2788,"style":864},[863],[112,2790,2792,2795],{"className":2791},[868],[112,2793,2607],{"className":2794,"style":2729},[868,869],[112,2796,2798],{"className":2797},[895],[112,2799,2801,2821],{"className":2800},[899,900],[112,2802,2804,2818],{"className":2803},[904],[112,2805,2807],{"className":2806,"style":1488},[908],[112,2808,2809,2812],{"style":2744},[112,2810],{"className":2811,"style":917},[916],[112,2813,2815],{"className":2814},[921,922,923,924],[112,2816,846],{"className":2817},[868,924],[112,2819,932],{"className":2820},[931],[112,2822,2824],{"className":2823},[904],[112,2825,2827],{"className":2826,"style":939},[908],[112,2828],{},[112,2830,796],{"className":2831},[873],[112,2833,799],{"className":2834},[868,1333],[112,2836],{"className":2837,"style":973},[884],[112,2839,829],{"className":2840},[977],[112,2842],{"className":2843,"style":973},[884],[112,2845,2847,2850,2890,2893,2896,2900,2903,2906],{"className":2846},[859],[112,2848],{"className":2849,"style":864},[863],[112,2851,2853,2856],{"className":2852},[868],[112,2854,2607],{"className":2855,"style":2729},[868,869],[112,2857,2859],{"className":2858},[895],[112,2860,2862,2882],{"className":2861},[899,900],[112,2863,2865,2879],{"className":2864},[904],[112,2866,2868],{"className":2867,"style":1488},[908],[112,2869,2870,2873],{"style":2744},[112,2871],{"className":2872,"style":917},[916],[112,2874,2876],{"className":2875},[921,922,923,924],[112,2877,1408],{"className":2878},[868,924],[112,2880,932],{"className":2881},[931],[112,2883,2885],{"className":2884},[904],[112,2886,2888],{"className":2887,"style":939},[908],[112,2889],{},[112,2891,796],{"className":2892},[873],[112,2894,799],{"className":2895},[868,1333],[112,2897,2899],{"className":2898},[945],"))",[112,2901],{"className":2902,"style":973},[884],[112,2904,2618],{"className":2905},[977],[112,2907],{"className":2908,"style":973},[884],[112,2910,2912,2915,2955,2958,2961,2964,2967],{"className":2911},[859],[112,2913],{"className":2914,"style":864},[863],[112,2916,2918,2921],{"className":2917},[868],[112,2919,2607],{"className":2920,"style":2729},[868,869],[112,2922,2924],{"className":2923},[895],[112,2925,2927,2947],{"className":2926},[899,900],[112,2928,2930,2944],{"className":2929},[904],[112,2931,2933],{"className":2932,"style":1488},[908],[112,2934,2935,2938],{"style":2744},[112,2936],{"className":2937,"style":917},[916],[112,2939,2941],{"className":2940},[921,922,923,924],[112,2942,2653],{"className":2943},[868,924],[112,2945,932],{"className":2946},[931],[112,2948,2950],{"className":2949},[904],[112,2951,2953],{"className":2952,"style":939},[908],[112,2954],{},[112,2956,796],{"className":2957},[873],[112,2959,799],{"className":2960},[868,1333],[112,2962],{"className":2963,"style":973},[884],[112,2965,829],{"className":2966},[977],[112,2968],{"className":2969,"style":973},[884],[112,2971,2973,2976,3016,3019,3022,3025,3028,3031],{"className":2972},[859],[112,2974],{"className":2975,"style":864},[863],[112,2977,2979,2982],{"className":2978},[868],[112,2980,2607],{"className":2981,"style":2729},[868,869],[112,2983,2985],{"className":2984},[895],[112,2986,2988,3008],{"className":2987},[899,900],[112,2989,2991,3005],{"className":2990},[904],[112,2992,2994],{"className":2993,"style":1488},[908],[112,2995,2996,2999],{"style":2744},[112,2997],{"className":2998,"style":917},[916],[112,3000,3002],{"className":3001},[921,922,923,924],[112,3003,1408],{"className":3004},[868,924],[112,3006,932],{"className":3007},[931],[112,3009,3011],{"className":3010},[904],[112,3012,3014],{"className":3013,"style":939},[908],[112,3015],{},[112,3017,796],{"className":3018},[873],[112,3020,799],{"className":3021},[868,1333],[112,3023,815],{"className":3024},[945],[112,3026],{"className":3027,"style":973},[884],[112,3029,829],{"className":3030},[977],[112,3032],{"className":3033,"style":973},[884],[112,3035,3037,3040,3080,3083,3086,3089,3092,3095],{"className":3036},[859],[112,3038],{"className":3039,"style":864},[863],[112,3041,3043,3046],{"className":3042},[868],[112,3044,2607],{"className":3045,"style":2729},[868,869],[112,3047,3049],{"className":3048},[895],[112,3050,3052,3072],{"className":3051},[899,900],[112,3053,3055,3069],{"className":3054},[904],[112,3056,3058],{"className":3057,"style":1488},[908],[112,3059,3060,3063],{"style":2744},[112,3061],{"className":3062,"style":917},[916],[112,3064,3066],{"className":3065},[921,922,923,924],[112,3067,846],{"className":3068},[868,924],[112,3070,932],{"className":3071},[931],[112,3073,3075],{"className":3074},[904],[112,3076,3078],{"className":3077,"style":939},[908],[112,3079],{},[112,3081,796],{"className":3082},[873],[112,3084,799],{"className":3085},[868,1333],[112,3087,2899],{"className":3088},[945],[112,3090],{"className":3091,"style":973},[884],[112,3093,2618],{"className":3094},[977],[112,3096],{"className":3097,"style":973},[884],[112,3099,3101,3105],{"className":3100},[859],[112,3102],{"className":3103,"style":3104},[863],"height:0.123em;",[112,3106,1421],{"className":3107},[1566],[12,3109,3110],{},"where:",[197,3112,3113,3196,3314],{},[200,3114,3115,3195],{},[112,3116,3118,3139],{"className":3117},[773],[112,3119,3121],{"className":3120},[777],[779,3122,3123],{"xmlns":781},[783,3124,3125,3137],{},[786,3126,3127,3129,3131],{},[789,3128,799],{"mathvariant":1305},[793,3130,1308],{},[839,3132,3133,3135],{},[789,3134,1314],{"mathvariant":1313},[789,3136,791],{},[848,3138,1319],{"encoding":850},[112,3140,3142,3160],{"className":3141,"ariaHidden":802},[855],[112,3143,3145,3148,3151,3154,3157],{"className":3144},[859],[112,3146],{"className":3147,"style":1329},[863],[112,3149,799],{"className":3150},[868,1333],[112,3152],{"className":3153,"style":949},[884],[112,3155,1308],{"className":3156},[953],[112,3158],{"className":3159,"style":949},[884],[112,3161,3163,3166],{"className":3162},[859],[112,3164],{"className":3165,"style":1349},[863],[112,3167,3169,3172],{"className":3168},[868],[112,3170,1314],{"className":3171},[868,1356],[112,3173,3175],{"className":3174},[895],[112,3176,3178],{"className":3177},[899],[112,3179,3181],{"className":3180},[904],[112,3182,3184],{"className":3183,"style":1349},[908],[112,3185,3186,3189],{"style":1052},[112,3187],{"className":3188,"style":917},[916],[112,3190,3192],{"className":3191},[921,922,923,924],[112,3193,791],{"className":3194},[868,869,924]," is the input vector",[200,3197,3198,3284,3285],{},[112,3199,3201,3226],{"className":3200},[773],[112,3202,3204],{"className":3203},[777],[779,3205,3206],{"xmlns":781},[783,3207,3208,3223],{},[786,3209,3210,3216,3218,3221],{},[805,3211,3212,3214],{},[789,3213,2607],{},[789,3215,812],{},[793,3217,796],{"stretchy":795},[793,3219,3220],{},"⋅",[793,3222,815],{"stretchy":795},[848,3224,3225],{"encoding":850},"q_i(\\cdot)",[112,3227,3229],{"className":3228,"ariaHidden":802},[855],[112,3230,3232,3235,3275,3278,3281],{"className":3231},[859],[112,3233],{"className":3234,"style":864},[863],[112,3236,3238,3241],{"className":3237},[868],[112,3239,2607],{"className":3240,"style":2729},[868,869],[112,3242,3244],{"className":3243},[895],[112,3245,3247,3267],{"className":3246},[899,900],[112,3248,3250,3264],{"className":3249},[904],[112,3251,3253],{"className":3252,"style":909},[908],[112,3254,3255,3258],{"style":2744},[112,3256],{"className":3257,"style":917},[916],[112,3259,3261],{"className":3260},[921,922,923,924],[112,3262,812],{"className":3263},[868,869,924],[112,3265,932],{"className":3266},[931],[112,3268,3270],{"className":3269},[904],[112,3271,3273],{"className":3272,"style":939},[908],[112,3274],{},[112,3276,796],{"className":3277},[873],[112,3279,3220],{"className":3280},[868],[112,3282,815],{"className":3283},[945]," is the quantizer at stage ",[112,3286,3288,3301],{"className":3287},[773],[112,3289,3291],{"className":3290},[777],[779,3292,3293],{"xmlns":781},[783,3294,3295,3299],{},[786,3296,3297],{},[789,3298,812],{},[848,3300,812],{"encoding":850},[112,3302,3304],{"className":3303,"ariaHidden":802},[855],[112,3305,3307,3311],{"className":3306},[859],[112,3308],{"className":3309,"style":3310},[863],"height:0.6595em;",[112,3312,812],{"className":3313},[868,869],[200,3315,3316,3633,3634],{},[112,3317,3319,3382],{"className":3318},[773],[112,3320,3322],{"className":3321},[777],[779,3323,3324],{"xmlns":781},[783,3325,3326,3379],{},[786,3327,3328,3335,3337,3339,3341,3357,3363,3365,3377],{},[805,3329,3330,3333],{},[789,3331,3332],{"mathvariant":1305},"r",[789,3334,812],{},[793,3336,818],{},[789,3338,799],{"mathvariant":1305},[793,3340,829],{},[1689,3342,3343,3346,3355],{},[793,3344,3345],{},"∑",[786,3347,3348,3351,3353],{},[789,3349,3350],{},"j",[793,3352,818],{},[844,3354,1408],{},[789,3356,812],{},[805,3358,3359,3361],{},[789,3360,2607],{},[789,3362,3350],{},[793,3364,796],{"stretchy":795},[805,3366,3367,3369],{},[789,3368,3332],{"mathvariant":1305},[786,3370,3371,3373,3375],{},[789,3372,3350],{},[793,3374,829],{},[844,3376,1408],{},[793,3378,815],{"stretchy":795},[848,3380,3381],{"encoding":850},"\\mathbf{r}_i = \\mathbf{x} - \\sum_{j=1}^i q_j(\\mathbf{r}_{j-1})",[112,3383,3385,3441,3459],{"className":3384,"ariaHidden":802},[855],[112,3386,3388,3392,3432,3435,3438],{"className":3387},[859],[112,3389],{"className":3390,"style":3391},[863],"height:0.5944em;vertical-align:-0.15em;",[112,3393,3395,3398],{"className":3394},[868],[112,3396,3332],{"className":3397},[868,1333],[112,3399,3401],{"className":3400},[895],[112,3402,3404,3424],{"className":3403},[899,900],[112,3405,3407,3421],{"className":3406},[904],[112,3408,3410],{"className":3409,"style":909},[908],[112,3411,3412,3415],{"style":912},[112,3413],{"className":3414,"style":917},[916],[112,3416,3418],{"className":3417},[921,922,923,924],[112,3419,812],{"className":3420},[868,869,924],[112,3422,932],{"className":3423},[931],[112,3425,3427],{"className":3426},[904],[112,3428,3430],{"className":3429,"style":939},[908],[112,3431],{},[112,3433],{"className":3434,"style":949},[884],[112,3436,818],{"className":3437},[953],[112,3439],{"className":3440,"style":949},[884],[112,3442,3444,3447,3450,3453,3456],{"className":3443},[859],[112,3445],{"className":3446,"style":2518},[863],[112,3448,799],{"className":3449},[868,1333],[112,3451],{"className":3452,"style":973},[884],[112,3454,829],{"className":3455},[977],[112,3457],{"className":3458,"style":973},[884],[112,3460,3462,3466,3534,3537,3578,3581,3630],{"className":3461},[859],[112,3463],{"className":3464,"style":3465},[863],"height:1.4004em;vertical-align:-0.4358em;",[112,3467,3469,3475],{"className":3468},[1740],[112,3470,3345],{"className":3471,"style":3474},[1740,3472,3473],"op-symbol","small-op","position:relative;top:0em;",[112,3476,3478],{"className":3477},[895],[112,3479,3481,3525],{"className":3480},[899,900],[112,3482,3484,3522],{"className":3483},[904],[112,3485,3488,3510],{"className":3486,"style":3487},[908],"height:0.9646em;",[112,3489,3491,3494],{"style":3490},"top:-2.4003em;margin-left:0em;margin-right:0.05em;",[112,3492],{"className":3493,"style":917},[916],[112,3495,3497],{"className":3496},[921,922,923,924],[112,3498,3500,3504,3507],{"className":3499},[868,924],[112,3501,3350],{"className":3502,"style":3503},[868,869,924],"margin-right:0.0572em;",[112,3505,818],{"className":3506},[953,924],[112,3508,1408],{"className":3509},[868,924],[112,3511,3513,3516],{"style":3512},"top:-3.2029em;margin-right:0.05em;",[112,3514],{"className":3515,"style":917},[916],[112,3517,3519],{"className":3518},[921,922,923,924],[112,3520,812],{"className":3521},[868,869,924],[112,3523,932],{"className":3524},[931],[112,3526,3528],{"className":3527},[904],[112,3529,3532],{"className":3530,"style":3531},[908],"height:0.4358em;",[112,3533],{},[112,3535],{"className":3536,"style":885},[884],[112,3538,3540,3543],{"className":3539},[868],[112,3541,2607],{"className":3542,"style":2729},[868,869],[112,3544,3546],{"className":3545},[895],[112,3547,3549,3569],{"className":3548},[899,900],[112,3550,3552,3566],{"className":3551},[904],[112,3553,3555],{"className":3554,"style":909},[908],[112,3556,3557,3560],{"style":2744},[112,3558],{"className":3559,"style":917},[916],[112,3561,3563],{"className":3562},[921,922,923,924],[112,3564,3350],{"className":3565,"style":3503},[868,869,924],[112,3567,932],{"className":3568},[931],[112,3570,3572],{"className":3571},[904],[112,3573,3576],{"className":3574,"style":3575},[908],"height:0.2861em;",[112,3577],{},[112,3579,796],{"className":3580},[873],[112,3582,3584,3587],{"className":3583},[868],[112,3585,3332],{"className":3586},[868,1333],[112,3588,3590],{"className":3589},[895],[112,3591,3593,3622],{"className":3592},[899,900],[112,3594,3596,3619],{"className":3595},[904],[112,3597,3599],{"className":3598,"style":909},[908],[112,3600,3601,3604],{"style":912},[112,3602],{"className":3603,"style":917},[916],[112,3605,3607],{"className":3606},[921,922,923,924],[112,3608,3610,3613,3616],{"className":3609},[868,924],[112,3611,3350],{"className":3612,"style":3503},[868,869,924],[112,3614,829],{"className":3615},[977,924],[112,3617,1408],{"className":3618},[868,924],[112,3620,932],{"className":3621},[931],[112,3623,3625],{"className":3624},[904],[112,3626,3628],{"className":3627,"style":3575},[908],[112,3629],{},[112,3631,815],{"className":3632},[945]," is the residual at stage ",[112,3635,3637,3650],{"className":3636},[773],[112,3638,3640],{"className":3639},[777],[779,3641,3642],{"xmlns":781},[783,3643,3644,3648],{},[786,3645,3646],{},[789,3647,812],{},[848,3649,812],{"encoding":850},[112,3651,3653],{"className":3652,"ariaHidden":802},[855],[112,3654,3656,3659],{"className":3655},[859],[112,3657],{"className":3658,"style":3310},[863],[112,3660,812],{"className":3661},[868,869],[12,3663,3664],{},"The final reconstruction is:",[12,3666,3667],{},[112,3668,3670,3729],{"className":3669},[773],[112,3671,3673],{"className":3672},[777],[779,3674,3675],{"xmlns":781},[783,3676,3677,3726],{},[786,3678,3679,3687,3689,3704,3710,3712,3724],{},[3680,3681,3682,3684],"mover",{"accent":802},[789,3683,799],{"mathvariant":1305},[793,3685,3686],{},"^",[793,3688,818],{},[1689,3690,3691,3693,3701],{},[793,3692,3345],{},[786,3694,3695,3697,3699],{},[789,3696,812],{},[793,3698,818],{},[844,3700,1408],{},[789,3702,3703],{},"N",[805,3705,3706,3708],{},[789,3707,2607],{},[789,3709,812],{},[793,3711,796],{"stretchy":795},[805,3713,3714,3716],{},[789,3715,3332],{"mathvariant":1305},[786,3717,3718,3720,3722],{},[789,3719,812],{},[793,3721,829],{},[844,3723,1408],{},[793,3725,815],{"stretchy":795},[848,3727,3728],{"encoding":850},"\\hat{\\mathbf{x}} = \\sum_{i=1}^N q_i(\\mathbf{r}_{i-1})",[112,3730,3732,3785],{"className":3731,"ariaHidden":802},[855],[112,3733,3735,3739,3776,3779,3782],{"className":3734},[859],[112,3736],{"className":3737,"style":3738},[863],"height:0.7079em;",[112,3740,3743],{"className":3741},[868,3742],"accent",[112,3744,3746],{"className":3745},[899],[112,3747,3749],{"className":3748},[904],[112,3750,3752,3762],{"className":3751,"style":3738},[908],[112,3753,3755,3759],{"style":3754},"top:-3em;",[112,3756],{"className":3757,"style":3758},[916],"height:3em;",[112,3760,799],{"className":3761},[868,1333],[112,3763,3765,3768],{"style":3764},"top:-3.0134em;",[112,3766],{"className":3767,"style":3758},[916],[112,3769,3773],{"className":3770,"style":3772},[3771],"accent-body","left:-0.25em;",[112,3774,3686],{"className":3775},[868],[112,3777],{"className":3778,"style":949},[884],[112,3780,818],{"className":3781},[953],[112,3783],{"className":3784,"style":949},[884],[112,3786,3788,3792,3855,3858,3898,3901,3951],{"className":3787},[859],[112,3789],{"className":3790,"style":3791},[863],"height:1.2809em;vertical-align:-0.2997em;",[112,3793,3795,3798],{"className":3794},[1740],[112,3796,3345],{"className":3797,"style":3474},[1740,3472,3473],[112,3799,3801],{"className":3800},[895],[112,3802,3804,3846],{"className":3803},[899,900],[112,3805,3807,3843],{"className":3806},[904],[112,3808,3811,3831],{"className":3809,"style":3810},[908],"height:0.9812em;",[112,3812,3813,3816],{"style":3490},[112,3814],{"className":3815,"style":917},[916],[112,3817,3819],{"className":3818},[921,922,923,924],[112,3820,3822,3825,3828],{"className":3821},[868,924],[112,3823,812],{"className":3824},[868,869,924],[112,3826,818],{"className":3827},[953,924],[112,3829,1408],{"className":3830},[868,924],[112,3832,3833,3836],{"style":3512},[112,3834],{"className":3835,"style":917},[916],[112,3837,3839],{"className":3838},[921,922,923,924],[112,3840,3703],{"className":3841,"style":3842},[868,869,924],"margin-right:0.109em;",[112,3844,932],{"className":3845},[931],[112,3847,3849],{"className":3848},[904],[112,3850,3853],{"className":3851,"style":3852},[908],"height:0.2997em;",[112,3854],{},[112,3856],{"className":3857,"style":885},[884],[112,3859,3861,3864],{"className":3860},[868],[112,3862,2607],{"className":3863,"style":2729},[868,869],[112,3865,3867],{"className":3866},[895],[112,3868,3870,3890],{"className":3869},[899,900],[112,3871,3873,3887],{"className":3872},[904],[112,3874,3876],{"className":3875,"style":909},[908],[112,3877,3878,3881],{"style":2744},[112,3879],{"className":3880,"style":917},[916],[112,3882,3884],{"className":3883},[921,922,923,924],[112,3885,812],{"className":3886},[868,869,924],[112,3888,932],{"className":3889},[931],[112,3891,3893],{"className":3892},[904],[112,3894,3896],{"className":3895,"style":939},[908],[112,3897],{},[112,3899,796],{"className":3900},[873],[112,3902,3904,3907],{"className":3903},[868],[112,3905,3332],{"className":3906},[868,1333],[112,3908,3910],{"className":3909},[895],[112,3911,3913,3942],{"className":3912},[899,900],[112,3914,3916,3939],{"className":3915},[904],[112,3917,3919],{"className":3918,"style":909},[908],[112,3920,3921,3924],{"style":912},[112,3922],{"className":3923,"style":917},[916],[112,3925,3927],{"className":3926},[921,922,923,924],[112,3928,3930,3933,3936],{"className":3929},[868,924],[112,3931,812],{"className":3932},[868,869,924],[112,3934,829],{"className":3935},[977,924],[112,3937,1408],{"className":3938},[868,924],[112,3940,932],{"className":3941},[931],[112,3943,3945],{"className":3944},[904],[112,3946,3949],{"className":3947,"style":3948},[908],"height:0.2083em;",[112,3950],{},[112,3952,815],{"className":3953},[945],[12,3955,3956],{},"RVQ builds the final approximation by adding up several small corrections instead of using one big codebook.",[91,3958,3960],{"className":106,"code":3959,"language":108,"meta":100,"style":100},"def residual_quantize(input_vec, codebooks):\n    \"\"\"Multi-stage quantization with progressive refinement\"\"\"\n    reconstruction = np.zeros_like(input_vec)\n    residual = input_vec.copy()\n    indices = []\n\n    for stage, codebook in enumerate(codebooks):\n        quant_vec, idx = quantize_vector(residual, codebook)\n        reconstruction += quant_vec\n        indices.append(idx)\n        residual = input_vec - reconstruction\n        print(f\"Stage {stage+1} residual norm: {np.linalg.norm(residual):.4f}\")\n\n    return reconstruction, indices\n",[98,3961,3962,3967,3972,3977,3982,3987,3991,3996,4001,4006,4011,4016,4021,4025],{"__ignoreMap":100},[112,3963,3964],{"class":114,"line":115},[112,3965,3966],{},"def residual_quantize(input_vec, codebooks):\n",[112,3968,3969],{"class":114,"line":121},[112,3970,3971],{},"    \"\"\"Multi-stage quantization with progressive refinement\"\"\"\n",[112,3973,3974],{"class":114,"line":127},[112,3975,3976],{},"    reconstruction = np.zeros_like(input_vec)\n",[112,3978,3979],{"class":114,"line":134},[112,3980,3981],{},"    residual = input_vec.copy()\n",[112,3983,3984],{"class":114,"line":140},[112,3985,3986],{},"    indices = []\n",[112,3988,3989],{"class":114,"line":146},[112,3990,131],{"emptyLinePlaceholder":130},[112,3992,3993],{"class":114,"line":151},[112,3994,3995],{},"    for stage, codebook in enumerate(codebooks):\n",[112,3997,3998],{"class":114,"line":157},[112,3999,4000],{},"        quant_vec, idx = quantize_vector(residual, codebook)\n",[112,4002,4003],{"class":114,"line":162},[112,4004,4005],{},"        reconstruction += quant_vec\n",[112,4007,4008],{"class":114,"line":168},[112,4009,4010],{},"        indices.append(idx)\n",[112,4012,4013],{"class":114,"line":174},[112,4014,4015],{},"        residual = input_vec - reconstruction\n",[112,4017,4018],{"class":114,"line":179},[112,4019,4020],{},"        print(f\"Stage {stage+1} residual norm: {np.linalg.norm(residual):.4f}\")\n",[112,4022,4023],{"class":114,"line":1129},[112,4024,131],{"emptyLinePlaceholder":130},[112,4026,4027],{"class":114,"line":1135},[112,4028,4029],{},"    return reconstruction, indices\n",[434,4031,4033],{"id":4032},"audio-demonstrations","Audio Demonstrations",[12,4035,4036],{},[34,4037,4038],{},"Original Audio",[12,4040,4041],{},[4042,4043],"audio",{"controls":130,"src":4044,"style":4045},"https:\u002F\u002Fassets.modelslab.com\u002Fgenerations\u002F296bb1a6-d6ad-43e3-bd7d-645dcba49b6d.wav","width: 100%; margin: 0.5rem 0;",[12,4047,4048],{},[34,4049,4050],{},"4 Codebooks Reconstruction",[12,4052,4053],{},[4042,4054],{"controls":130,"src":4055,"style":4045},"https:\u002F\u002Fassets.modelslab.com\u002Fgenerations\u002Fbea3a3ea-024e-435b-a978-41e6f3af9af4.wav",[12,4057,4058],{},[34,4059,4060],{},"8 Codebooks Reconstruction",[12,4062,4063],{},[4042,4064],{"controls":130,"src":4065,"style":4045},"https:\u002F\u002Fassets.modelslab.com\u002Fgenerations\u002F85bdaddd-0c28-4d5a-854f-633b7b042c2a.wav",[12,4067,4068],{},[34,4069,4070],{},"16 Codebooks Reconstruction",[12,4072,4073],{},[4042,4074],{"controls":130,"src":4075,"style":4045},"https:\u002F\u002Fassets.modelslab.com\u002Fgenerations\u002F72a29fcf-79ab-4dea-a4c6-53b7f3d21cd4.wav",[12,4077,4078],{},[34,4079,4080],{},"32 Codebooks Reconstruction",[12,4082,4083],{},[4042,4084],{"controls":130,"src":4085,"style":4045},"https:\u002F\u002Fassets.modelslab.com\u002Fgenerations\u002F3ec7e80d-1d18-449e-ba84-0b75034a36a6.wav",[19,4087,4089],{"id":4088},"bitrate-control-through-rvq","Bitrate Control Through RVQ",[12,4091,4092],{},[212,4093],{"alt":4094,"src":4095},"RVQ in EnCodec - Bitrate Control","\u002Frvq-in-encodec.png",[12,4097,4098],{},[219,4099,4100],{},"Figure 4: RVQ in EnCodec — Bitrate Control Through Multiple Quantization Stages",[12,4102,4103],{},"One of the biggest advantages of RVQ is fine-grained control over bitrate. By adjusting the number of quantization stages or the size of each codebook, we can trade off quality versus compression.",[12,4105,4106],{},"Meta's EnCodec paper demonstrated the practical power of this approach.",[12,4108,4109],{},[212,4110],{"alt":4111,"src":4112},"Meta's EnCodec Architecture","\u002Fmeta-encodec-arch.png",[12,4114,4115],{},[219,4116,4117],{},"Figure 5: Meta's EnCodec Architecture",[12,4119,4120],{},"The mathematical relationship shows exponential growth in representational capacity:",[12,4122,4123],{},[112,4124,4126,4158],{"className":4125},[773],[112,4127,4129],{"className":4128},[777],[779,4130,4131],{"xmlns":781},[783,4132,4133,4155],{},[786,4134,4135,4139,4141],{},[4136,4137,4138],"mtext",{},"Total patterns",[793,4140,818],{},[839,4142,4143,4145],{},[844,4144,846],{},[786,4146,4147,4150,4153],{},[789,4148,4149],{},"b",[793,4151,4152],{},"×",[789,4154,3703],{},[848,4156,4157],{"encoding":850},"\\text{Total patterns} = 2^{b \\times N}",[112,4159,4161,4183],{"className":4160,"ariaHidden":802},[855],[112,4162,4164,4168,4174,4177,4180],{"className":4163},[859],[112,4165],{"className":4166,"style":4167},[863],"height:0.8889em;vertical-align:-0.1944em;",[112,4169,4171],{"className":4170},[868,96],[112,4172,4138],{"className":4173},[868],[112,4175],{"className":4176,"style":949},[884],[112,4178,818],{"className":4179},[953],[112,4181],{"className":4182,"style":949},[884],[112,4184,4186,4189],{"className":4185},[859],[112,4187],{"className":4188,"style":1349},[863],[112,4190,4192,4195],{"className":4191},[868],[112,4193,846],{"className":4194},[868],[112,4196,4198],{"className":4197},[895],[112,4199,4201],{"className":4200},[899],[112,4202,4204],{"className":4203},[904],[112,4205,4207],{"className":4206,"style":1349},[908],[112,4208,4209,4212],{"style":1052},[112,4210],{"className":4211,"style":917},[916],[112,4213,4215],{"className":4214},[921,922,923,924],[112,4216,4218,4221,4224],{"className":4217},[868,924],[112,4219,4149],{"className":4220},[868,869,924],[112,4222,4152],{"className":4223},[977,924],[112,4225,3703],{"className":4226,"style":3842},[868,869,924],[12,4228,4229,4230,4259,4260,4288],{},"where ",[112,4231,4233,4246],{"className":4232},[773],[112,4234,4236],{"className":4235},[777],[779,4237,4238],{"xmlns":781},[783,4239,4240,4244],{},[786,4241,4242],{},[789,4243,4149],{},[848,4245,4149],{"encoding":850},[112,4247,4249],{"className":4248,"ariaHidden":802},[855],[112,4250,4252,4256],{"className":4251},[859],[112,4253],{"className":4254,"style":4255},[863],"height:0.6944em;",[112,4257,4149],{"className":4258},[868,869]," is bits per stage and ",[112,4261,4263,4276],{"className":4262},[773],[112,4264,4266],{"className":4265},[777],[779,4267,4268],{"xmlns":781},[783,4269,4270,4274],{},[786,4271,4272],{},[789,4273,3703],{},[848,4275,3703],{"encoding":850},[112,4277,4279],{"className":4278,"ariaHidden":802},[855],[112,4280,4282,4285],{"className":4281},[859],[112,4283],{"className":4284,"style":1446},[863],[112,4286,3703],{"className":4287,"style":3842},[868,869]," is the number of stages.",[19,4290,4292],{"id":4291},"exponential-moving-average-ema-codebook-update","Exponential Moving Average (EMA) Codebook Update",[12,4294,4295],{},"To stabilize training, each codeword is updated using an exponential moving average:",[12,4297,4298],{},[112,4299,4301,4380],{"className":4300},[773],[112,4302,4304],{"className":4303},[777],[779,4305,4306],{"xmlns":781},[783,4307,4308,4377],{},[786,4309,4310,4329,4331,4334,4337,4351,4353,4355,4357,4359,4361,4363,4365],{},[1689,4311,4312,4314,4316],{},[789,4313,809],{"mathvariant":1305},[789,4315,812],{},[786,4317,4318,4320,4323,4325,4327],{},[793,4319,796],{"stretchy":795},[789,4321,4322],{},"t",[793,4324,2618],{},[844,4326,1408],{},[793,4328,815],{"stretchy":795},[793,4330,818],{},[789,4332,4333],{},"α",[4136,4335,4336],{}," ",[1689,4338,4339,4341,4343],{},[789,4340,809],{"mathvariant":1305},[789,4342,812],{},[786,4344,4345,4347,4349],{},[793,4346,796],{"stretchy":795},[789,4348,4322],{},[793,4350,815],{"stretchy":795},[793,4352,2618],{},[793,4354,796],{"stretchy":795},[844,4356,1408],{},[793,4358,829],{},[789,4360,4333],{},[793,4362,815],{"stretchy":795},[4136,4364,4336],{},[805,4366,4367,4375],{},[3680,4368,4369,4372],{"accent":802},[789,4370,4371],{"mathvariant":1305},"v",[793,4373,4374],{},"ˉ",[789,4376,812],{},[848,4378,4379],{"encoding":850},"\\mathbf{c}_i^{(t+1)} = \\alpha \\, \\mathbf{c}_i^{(t)} + (1 - \\alpha) \\, \\bar{\\mathbf{v}}_i",[112,4381,4383,4469,4551,4572],{"className":4382,"ariaHidden":802},[855],[112,4384,4386,4390,4460,4463,4466],{"className":4385},[859],[112,4387],{"className":4388,"style":4389},[863],"height:1.3217em;vertical-align:-0.2769em;",[112,4391,4393,4396],{"className":4392},[868],[112,4394,809],{"className":4395},[868,1333],[112,4397,4399],{"className":4398},[895],[112,4400,4402,4451],{"className":4401},[899,900],[112,4403,4405,4448],{"className":4404},[904],[112,4406,4409,4421],{"className":4407,"style":4408},[908],"height:1.0448em;",[112,4410,4412,4415],{"style":4411},"top:-2.4231em;margin-left:0em;margin-right:0.05em;",[112,4413],{"className":4414,"style":917},[916],[112,4416,4418],{"className":4417},[921,922,923,924],[112,4419,812],{"className":4420},[868,869,924],[112,4422,4424,4427],{"style":4423},"top:-3.2198em;margin-right:0.05em;",[112,4425],{"className":4426,"style":917},[916],[112,4428,4430],{"className":4429},[921,922,923,924],[112,4431,4433,4436,4439,4442,4445],{"className":4432},[868,924],[112,4434,796],{"className":4435},[873,924],[112,4437,4322],{"className":4438},[868,869,924],[112,4440,2618],{"className":4441},[977,924],[112,4443,1408],{"className":4444},[868,924],[112,4446,815],{"className":4447},[945,924],[112,4449,932],{"className":4450},[931],[112,4452,4454],{"className":4453},[904],[112,4455,4458],{"className":4456,"style":4457},[908],"height:0.2769em;",[112,4459],{},[112,4461],{"className":4462,"style":949},[884],[112,4464,818],{"className":4465},[953],[112,4467],{"className":4468,"style":949},[884],[112,4470,4472,4475,4479,4482,4542,4545,4548],{"className":4471},[859],[112,4473],{"className":4474,"style":4389},[863],[112,4476,4333],{"className":4477,"style":4478},[868,869],"margin-right:0.0037em;",[112,4480],{"className":4481,"style":885},[884],[112,4483,4485,4488],{"className":4484},[868],[112,4486,809],{"className":4487},[868,1333],[112,4489,4491],{"className":4490},[895],[112,4492,4494,4534],{"className":4493},[899,900],[112,4495,4497,4531],{"className":4496},[904],[112,4498,4500,4511],{"className":4499,"style":4408},[908],[112,4501,4502,4505],{"style":4411},[112,4503],{"className":4504,"style":917},[916],[112,4506,4508],{"className":4507},[921,922,923,924],[112,4509,812],{"className":4510},[868,869,924],[112,4512,4513,4516],{"style":4423},[112,4514],{"className":4515,"style":917},[916],[112,4517,4519],{"className":4518},[921,922,923,924],[112,4520,4522,4525,4528],{"className":4521},[868,924],[112,4523,796],{"className":4524},[873,924],[112,4526,4322],{"className":4527},[868,869,924],[112,4529,815],{"className":4530},[945,924],[112,4532,932],{"className":4533},[931],[112,4535,4537],{"className":4536},[904],[112,4538,4540],{"className":4539,"style":4457},[908],[112,4541],{},[112,4543],{"className":4544,"style":973},[884],[112,4546,2618],{"className":4547},[977],[112,4549],{"className":4550,"style":973},[884],[112,4552,4554,4557,4560,4563,4566,4569],{"className":4553},[859],[112,4555],{"className":4556,"style":864},[863],[112,4558,796],{"className":4559},[873],[112,4561,1408],{"className":4562},[868],[112,4564],{"className":4565,"style":973},[884],[112,4567,829],{"className":4568},[977],[112,4570],{"className":4571,"style":973},[884],[112,4573,4575,4578,4581,4584,4587],{"className":4574},[859],[112,4576],{"className":4577,"style":864},[863],[112,4579,4333],{"className":4580,"style":4478},[868,869],[112,4582,815],{"className":4583},[945],[112,4585],{"className":4586,"style":885},[884],[112,4588,4590,4623],{"className":4589},[868],[112,4591,4593],{"className":4592},[868,3742],[112,4594,4596],{"className":4595},[899],[112,4597,4599],{"className":4598},[904],[112,4600,4603,4612],{"className":4601,"style":4602},[908],"height:0.5812em;",[112,4604,4605,4608],{"style":3754},[112,4606],{"className":4607,"style":3758},[916],[112,4609,4371],{"className":4610,"style":4611},[868,1333],"margin-right:0.016em;",[112,4613,4614,4617],{"style":3764},[112,4615],{"className":4616,"style":3758},[916],[112,4618,4620],{"className":4619,"style":3772},[3771],[112,4621,4374],{"className":4622},[868],[112,4624,4626],{"className":4625},[895],[112,4627,4629,4650],{"className":4628},[899,900],[112,4630,4632,4647],{"className":4631},[904],[112,4633,4635],{"className":4634,"style":909},[908],[112,4636,4638,4641],{"style":4637},"top:-2.55em;margin-left:-0.016em;margin-right:0.05em;",[112,4639],{"className":4640,"style":917},[916],[112,4642,4644],{"className":4643},[921,922,923,924],[112,4645,812],{"className":4646},[868,869,924],[112,4648,932],{"className":4649},[931],[112,4651,4653],{"className":4652},[904],[112,4654,4656],{"className":4655,"style":939},[908],[112,4657],{},[12,4659,3110],{},[197,4661,4662,4792,4926],{},[200,4663,4664,4762,4763],{},[112,4665,4667,4693],{"className":4666},[773],[112,4668,4670],{"className":4669},[777],[779,4671,4672],{"xmlns":781},[783,4673,4674,4690],{},[786,4675,4676],{},[1689,4677,4678,4680,4682],{},[789,4679,809],{"mathvariant":1305},[789,4681,812],{},[786,4683,4684,4686,4688],{},[793,4685,796],{"stretchy":795},[789,4687,4322],{},[793,4689,815],{"stretchy":795},[848,4691,4692],{"encoding":850},"\\mathbf{c}_i^{(t)}",[112,4694,4696],{"className":4695,"ariaHidden":802},[855],[112,4697,4699,4702],{"className":4698},[859],[112,4700],{"className":4701,"style":4389},[863],[112,4703,4705,4708],{"className":4704},[868],[112,4706,809],{"className":4707},[868,1333],[112,4709,4711],{"className":4710},[895],[112,4712,4714,4754],{"className":4713},[899,900],[112,4715,4717,4751],{"className":4716},[904],[112,4718,4720,4731],{"className":4719,"style":4408},[908],[112,4721,4722,4725],{"style":4411},[112,4723],{"className":4724,"style":917},[916],[112,4726,4728],{"className":4727},[921,922,923,924],[112,4729,812],{"className":4730},[868,869,924],[112,4732,4733,4736],{"style":4423},[112,4734],{"className":4735,"style":917},[916],[112,4737,4739],{"className":4738},[921,922,923,924],[112,4740,4742,4745,4748],{"className":4741},[868,924],[112,4743,796],{"className":4744},[873,924],[112,4746,4322],{"className":4747},[868,869,924],[112,4749,815],{"className":4750},[945,924],[112,4752,932],{"className":4753},[931],[112,4755,4757],{"className":4756},[904],[112,4758,4760],{"className":4759,"style":4457},[908],[112,4761],{}," is the codeword at iteration ",[112,4764,4766,4779],{"className":4765},[773],[112,4767,4769],{"className":4768},[777],[779,4770,4771],{"xmlns":781},[783,4772,4773,4777],{},[786,4774,4775],{},[789,4776,4322],{},[848,4778,4322],{"encoding":850},[112,4780,4782],{"className":4781,"ariaHidden":802},[855],[112,4783,4785,4789],{"className":4784},[859],[112,4786],{"className":4787,"style":4788},[863],"height:0.6151em;",[112,4790,4322],{"className":4791},[868,869],[200,4793,4794,4897,4898],{},[112,4795,4797,4819],{"className":4796},[773],[112,4798,4800],{"className":4799},[777],[779,4801,4802],{"xmlns":781},[783,4803,4804,4816],{},[786,4805,4806],{},[805,4807,4808,4814],{},[3680,4809,4810,4812],{"accent":802},[789,4811,4371],{"mathvariant":1305},[793,4813,4374],{},[789,4815,812],{},[848,4817,4818],{"encoding":850},"\\bar{\\mathbf{v}}_i",[112,4820,4822],{"className":4821,"ariaHidden":802},[855],[112,4823,4825,4829],{"className":4824},[859],[112,4826],{"className":4827,"style":4828},[863],"height:0.7312em;vertical-align:-0.15em;",[112,4830,4832,4863],{"className":4831},[868],[112,4833,4835],{"className":4834},[868,3742],[112,4836,4838],{"className":4837},[899],[112,4839,4841],{"className":4840},[904],[112,4842,4844,4852],{"className":4843,"style":4602},[908],[112,4845,4846,4849],{"style":3754},[112,4847],{"className":4848,"style":3758},[916],[112,4850,4371],{"className":4851,"style":4611},[868,1333],[112,4853,4854,4857],{"style":3764},[112,4855],{"className":4856,"style":3758},[916],[112,4858,4860],{"className":4859,"style":3772},[3771],[112,4861,4374],{"className":4862},[868],[112,4864,4866],{"className":4865},[895],[112,4867,4869,4889],{"className":4868},[899,900],[112,4870,4872,4886],{"className":4871},[904],[112,4873,4875],{"className":4874,"style":909},[908],[112,4876,4877,4880],{"style":4637},[112,4878],{"className":4879,"style":917},[916],[112,4881,4883],{"className":4882},[921,922,923,924],[112,4884,812],{"className":4885},[868,869,924],[112,4887,932],{"className":4888},[931],[112,4890,4892],{"className":4891},[904],[112,4893,4895],{"className":4894,"style":939},[908],[112,4896],{}," is the mean of all encoder outputs assigned to codeword ",[112,4899,4901,4914],{"className":4900},[773],[112,4902,4904],{"className":4903},[777],[779,4905,4906],{"xmlns":781},[783,4907,4908,4912],{},[786,4909,4910],{},[789,4911,812],{},[848,4913,812],{"encoding":850},[112,4915,4917],{"className":4916,"ariaHidden":802},[855],[112,4918,4920,4923],{"className":4919},[859],[112,4921],{"className":4922,"style":3310},[863],[112,4924,812],{"className":4925},[868,869],[200,4927,4928,5003],{},[112,4929,4931,4958],{"className":4930},[773],[112,4932,4934],{"className":4933},[777],[779,4935,4936],{"xmlns":781},[783,4937,4938,4955],{},[786,4939,4940,4942,4944,4946,4949,4951,4953],{},[789,4941,4333],{},[793,4943,1308],{},[793,4945,1995],{"stretchy":795},[844,4947,4948],{},"0",[793,4950,803],{"separator":802},[844,4952,1408],{},[793,4954,815],{"stretchy":795},[848,4956,4957],{"encoding":850},"\\alpha \\in [0, 1)",[112,4959,4961,4979],{"className":4960,"ariaHidden":802},[855],[112,4962,4964,4967,4970,4973,4976],{"className":4963},[859],[112,4965],{"className":4966,"style":1329},[863],[112,4968,4333],{"className":4969,"style":4478},[868,869],[112,4971],{"className":4972,"style":949},[884],[112,4974,1308],{"className":4975},[953],[112,4977],{"className":4978,"style":949},[884],[112,4980,4982,4985,4988,4991,4994,4997,5000],{"className":4981},[859],[112,4983],{"className":4984,"style":864},[863],[112,4986,1995],{"className":4987},[873],[112,4989,4948],{"className":4990},[868],[112,4992,803],{"className":4993},[880],[112,4995],{"className":4996,"style":885},[884],[112,4998,1408],{"className":4999},[868],[112,5001,815],{"className":5002},[945]," is the momentum parameter (typically 0.99)",[12,5005,5006,5007,5037],{},"A higher ",[112,5008,5010,5024],{"className":5009},[773],[112,5011,5013],{"className":5012},[777],[779,5014,5015],{"xmlns":781},[783,5016,5017,5021],{},[786,5018,5019],{},[789,5020,4333],{},[848,5022,5023],{"encoding":850},"\\alpha",[112,5025,5027],{"className":5026,"ariaHidden":802},[855],[112,5028,5030,5034],{"className":5029},[859],[112,5031],{"className":5032,"style":5033},[863],"height:0.4306em;",[112,5035,4333],{"className":5036,"style":4478},[868,869]," means slower, smoother updates; lower values adapt faster but can be noisy. This EMA rule helps the codebook evolve continuously, reducing abrupt jumps and preventing codeword collapse.",[696,5039],{},[19,5041,601],{"id":600},[332,5043,5044,5058,5072,5085,5098],{},[200,5045,5046,5049,5050,613,5053],{},[34,5047,5048],{},"Défossez, A., Copet, J., Synnaeve, G., & Adi, Y."," (2022). ",[219,5051,5052],{},"High Fidelity Neural Audio Compression",[51,5054,5057],{"href":5055,"rel":5056},"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2210.13438",[55],"arXiv:2210.13438",[200,5059,5060,5063,5064,613,5067],{},[34,5061,5062],{},"Zeghidour, N., et al."," (2021). ",[219,5065,5066],{},"SoundStream: An End-to-End Neural Audio Codec",[51,5068,5071],{"href":5069,"rel":5070},"https:\u002F\u002Fresearch.google\u002Fpubs\u002Fsoundstream-an-end-to-end-neural-audio-codec\u002F",[55],"Google Research",[200,5073,5074,624,5077,624,5080],{},[34,5075,5076],{},"AssemblyAI.",[219,5078,5079],{},"What is Residual Vector Quantization?",[51,5081,5084],{"href":5082,"rel":5083},"https:\u002F\u002Fwww.assemblyai.com\u002Fblog\u002Fwhat-is-residual-vector-quantization",[55],"assemblyai.com",[200,5086,5087,624,5090,613,5093],{},[34,5088,5089],{},"Notes by Lex.",[219,5091,5092],{},"Residual Vector Quantisation",[51,5094,5097],{"href":5095,"rel":5096},"https:\u002F\u002Fnotesbylex.com\u002Fresidual-vector-quantisation",[55],"notesbylex.com",[200,5099,5100,624,5103,613,5106],{},[34,5101,5102],{},"Yannic Kilcher.",[219,5104,5105],{},"High Fidelity Neural Audio Compression (EnCodec Explained)",[51,5107,5110],{"href":5108,"rel":5109},"https:\u002F\u002Fyoutu.be\u002FXt9S74BHsvc",[55],"YouTube",[651,5112,653],{},{"title":100,"searchDepth":121,"depth":121,"links":5114},[5115,5116,5117,5118,5119,5122,5125,5126,5127],{"id":700,"depth":121,"text":701},{"id":721,"depth":121,"text":722},{"id":757,"depth":121,"text":758},{"id":1179,"depth":121,"text":1180},{"id":1286,"depth":121,"text":1287,"children":5120},[5121],{"id":2198,"depth":127,"text":2199},{"id":2565,"depth":121,"text":2566,"children":5123},[5124],{"id":4032,"depth":127,"text":4033},{"id":4088,"depth":121,"text":4089},{"id":4291,"depth":121,"text":4292},{"id":600,"depth":121,"text":601},[5129,5130,730],"speech-synthesis","codecs","2025-11-08","Exploring the role of vector quantization in audio compression and its uses in neural audio codecs.",{},"\u002Fblog\u002Fneural-audio-codec-rvq",{"title":687,"description":5132},"blog\u002Fneural-audio-codec-rvq","IZ0owy_BfWVwsBEoOkXEB4dH-9F1gU4UPXg7OfxvROk",{"id":5139,"title":5140,"author":7,"body":5141,"categories":5643,"date":5648,"description":5649,"extension":677,"hidden":678,"meta":5650,"navigation":130,"path":5651,"seo":5652,"stem":5653,"thumbnail":683,"__hash__":5654},"blog\u002Fblog\u002Fteaching-models-to-write-kernels.md","Teaching models to write kernels; dataset, training, and honest results",{"type":9,"value":5142,"toc":5633},[5143,5146,5148,5152,5169,5173,5183,5193,5200,5218,5221,5227,5229,5233,5236,5239,5279,5286,5288,5291,5294,5466,5468,5471,5474,5508,5510,5514,5519,5548,5553,5593,5596,5598,5601,5612,5614,5618,5626,5628,5631],[12,5144,5145],{},"i wanted a model that actually understands low‑level GPU code: masked loads, boundary checks, shared memory... the stuff people sweat about and models usually mess up. this is a short, practical writeup: what i collected, how i cleaned it, how i trained, how i evaluated.",[696,5147],{},[19,5149,5151],{"id":5150},"tldr","tl;dr",[197,5153,5154,5157,5166],{},[200,5155,5156],{},"a focused dataset of triton kernel bodies and cuda→triton translation pairs (a few thousand examples).",[200,5158,5159,5160,5165],{},"cleaned, ",[51,5161,5164],{"href":5162,"rel":5163},"https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FData_deduplication",[55],"deduped",", licensed... ready for adapter-style fine‑tuning.",[200,5167,5168],{},"i fine‑tuned adapters on a qwen 8b checkpoint using a single RTX 3090... results are useful as kernel drafts.",[19,5170,5172],{"id":5171},"whats-in-the-dataset","what's in the dataset",[12,5174,5175,624,5178],{},[34,5176,5177],{},"HuggingFace Dataset:",[51,5179,5182],{"href":5180,"rel":5181},"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fedwixx\u002Ftriton-code-dataset",[55],"edwixx\u002Ftriton-code-dataset",[12,5184,5185,624,5188],{},[34,5186,5187],{},"Fine-tuned Model:",[51,5189,5192],{"href":5190,"rel":5191},"https:\u002F\u002Fhuggingface.co\u002Fedwixx\u002Fqwen3-8b-triton-finetune",[55],"edwixx\u002Fqwen3-8b-triton-finetune",[12,5194,5195,5196,5199],{},"Two CSV splits, simple schema (",[98,5197,5198],{},"prompt,completion","):",[197,5201,5202,5210],{},[200,5203,5204,5209],{},[34,5205,5206],{},[98,5207,5208],{},"fim_sft.csv"," — fill‑in‑the‑middle for triton kernels (prompt: signature + context, completion: body).",[200,5211,5212,5217],{},[34,5213,5214],{},[98,5215,5216],{},"cu2triton_sft.csv"," — CUDA functions paired with hand‑crafted or curated Triton rewrites.",[12,5219,5220],{},"Repo layout:",[91,5222,5225],{"className":5223,"code":5224,"language":96},[94],"\u002Ffim_sft.csv\n\u002Fcu2triton_sft.csv\n\u002FANALYSIS.json\n\u002FANALYSIS.md\n\u002FPROVENANCE.md\n\u002Ffigs\u002F*.png\n",[98,5226,5224],{"__ignoreMap":100},[696,5228],{},[19,5230,5232],{"id":5231},"how-i-built-it","how i built it",[12,5234,5235],{},"i started by searching repositories, documentation pages, and blog posts for kernel examples that show real GPU concerns. i prioritized permissively licensed sources and files that were self contained or had clear argument shapes.",[12,5237,5238],{},"next i normalized the text — unify newlines, trim whitespace, collapse repeated spaces and tabs:",[91,5240,5242],{"className":106,"code":5241,"language":108,"meta":100,"style":100},"def normalize_code(s: str) -> str:\n    import re\n    s = s.replace('\\r\\n', '\\n')\n    s = s.strip()\n    s = re.sub(r'[ \\t]+', ' ', s)\n    s = re.sub(r'\\n{3,}', '\\n\\n', s)\n    return s\n",[98,5243,5244,5249,5254,5259,5264,5269,5274],{"__ignoreMap":100},[112,5245,5246],{"class":114,"line":115},[112,5247,5248],{},"def normalize_code(s: str) -> str:\n",[112,5250,5251],{"class":114,"line":121},[112,5252,5253],{},"    import re\n",[112,5255,5256],{"class":114,"line":127},[112,5257,5258],{},"    s = s.replace('\\r\\n', '\\n')\n",[112,5260,5261],{"class":114,"line":134},[112,5262,5263],{},"    s = s.strip()\n",[112,5265,5266],{"class":114,"line":140},[112,5267,5268],{},"    s = re.sub(r'[ \\t]+', ' ', s)\n",[112,5270,5271],{"class":114,"line":146},[112,5272,5273],{},"    s = re.sub(r'\\n{3,}', '\\n\\n', s)\n",[112,5275,5276],{"class":114,"line":151},[112,5277,5278],{},"    return s\n",[12,5280,5281,5282,5285],{},"after cleaning i deduplicated exactly. compute a SHA‑1 hash over ",[98,5283,5284],{},"normalize(prompt) ||| normalize(completion)"," and drop exact matches. then run a leakage filter using Jaccard word overlap and flag pairs with overlap ≥ 0.8 for manual review.",[696,5287],{},[19,5289,5290],{"id":5290},"training",[12,5292,5293],{},"i used qwen 8b and trained LoRA adapters on a single RTX 3090. loaded the base model with 4‑bit quantization using bitsandbytes.",[91,5295,5297],{"className":106,"code":5296,"language":108,"meta":100,"style":100},"from transformers import BitsAndBytesConfig\nfrom peft import LoraConfig\nfrom trl import SFTConfig\n\nbnb = BitsAndBytesConfig(\n    load_in_4bit=True,\n    bnb_4bit_quant_type=\"nf4\",\n    bnb_4bit_compute_dtype=\"bfloat16\",\n)\n\npeft_cfg = LoraConfig(\n    r=32,\n    lora_alpha=32,\n    lora_dropout=0.05,\n    bias=\"none\",\n    target_modules=[\"q_proj\",\"k_proj\",\"v_proj\",\"o_proj\",\"gate_proj\",\"up_proj\",\"down_proj\"],\n    task_type=\"CAUSAL_LM\",\n)\n\ncfg = SFTConfig(\n    output_dir=\"qwen3_8b_triton_fim_lora\",\n    num_train_epochs=1,\n    per_device_train_batch_size=1,\n    gradient_accumulation_steps=16,\n    learning_rate=1e-4,\n    lr_scheduler_type=\"cosine\",\n    warmup_ratio=0.02,\n    packing=True,\n    max_length=1024,\n    bf16=True,\n    gradient_checkpointing=True,\n)\n",[98,5298,5299,5304,5309,5314,5318,5323,5328,5333,5338,5343,5347,5352,5357,5362,5367,5372,5377,5382,5386,5390,5395,5401,5407,5413,5419,5425,5431,5437,5443,5449,5455,5461],{"__ignoreMap":100},[112,5300,5301],{"class":114,"line":115},[112,5302,5303],{},"from transformers import BitsAndBytesConfig\n",[112,5305,5306],{"class":114,"line":121},[112,5307,5308],{},"from peft import LoraConfig\n",[112,5310,5311],{"class":114,"line":127},[112,5312,5313],{},"from trl import SFTConfig\n",[112,5315,5316],{"class":114,"line":134},[112,5317,131],{"emptyLinePlaceholder":130},[112,5319,5320],{"class":114,"line":140},[112,5321,5322],{},"bnb = BitsAndBytesConfig(\n",[112,5324,5325],{"class":114,"line":146},[112,5326,5327],{},"    load_in_4bit=True,\n",[112,5329,5330],{"class":114,"line":151},[112,5331,5332],{},"    bnb_4bit_quant_type=\"nf4\",\n",[112,5334,5335],{"class":114,"line":157},[112,5336,5337],{},"    bnb_4bit_compute_dtype=\"bfloat16\",\n",[112,5339,5340],{"class":114,"line":162},[112,5341,5342],{},")\n",[112,5344,5345],{"class":114,"line":168},[112,5346,131],{"emptyLinePlaceholder":130},[112,5348,5349],{"class":114,"line":174},[112,5350,5351],{},"peft_cfg = LoraConfig(\n",[112,5353,5354],{"class":114,"line":179},[112,5355,5356],{},"    r=32,\n",[112,5358,5359],{"class":114,"line":1129},[112,5360,5361],{},"    lora_alpha=32,\n",[112,5363,5364],{"class":114,"line":1135},[112,5365,5366],{},"    lora_dropout=0.05,\n",[112,5368,5369],{"class":114,"line":1141},[112,5370,5371],{},"    bias=\"none\",\n",[112,5373,5374],{"class":114,"line":1147},[112,5375,5376],{},"    target_modules=[\"q_proj\",\"k_proj\",\"v_proj\",\"o_proj\",\"gate_proj\",\"up_proj\",\"down_proj\"],\n",[112,5378,5379],{"class":114,"line":1153},[112,5380,5381],{},"    task_type=\"CAUSAL_LM\",\n",[112,5383,5384],{"class":114,"line":1159},[112,5385,5342],{},[112,5387,5388],{"class":114,"line":1164},[112,5389,131],{"emptyLinePlaceholder":130},[112,5391,5392],{"class":114,"line":1170},[112,5393,5394],{},"cfg = SFTConfig(\n",[112,5396,5398],{"class":114,"line":5397},21,[112,5399,5400],{},"    output_dir=\"qwen3_8b_triton_fim_lora\",\n",[112,5402,5404],{"class":114,"line":5403},22,[112,5405,5406],{},"    num_train_epochs=1,\n",[112,5408,5410],{"class":114,"line":5409},23,[112,5411,5412],{},"    per_device_train_batch_size=1,\n",[112,5414,5416],{"class":114,"line":5415},24,[112,5417,5418],{},"    gradient_accumulation_steps=16,\n",[112,5420,5422],{"class":114,"line":5421},25,[112,5423,5424],{},"    learning_rate=1e-4,\n",[112,5426,5428],{"class":114,"line":5427},26,[112,5429,5430],{},"    lr_scheduler_type=\"cosine\",\n",[112,5432,5434],{"class":114,"line":5433},27,[112,5435,5436],{},"    warmup_ratio=0.02,\n",[112,5438,5440],{"class":114,"line":5439},28,[112,5441,5442],{},"    packing=True,\n",[112,5444,5446],{"class":114,"line":5445},29,[112,5447,5448],{},"    max_length=1024,\n",[112,5450,5452],{"class":114,"line":5451},30,[112,5453,5454],{},"    bf16=True,\n",[112,5456,5458],{"class":114,"line":5457},31,[112,5459,5460],{},"    gradient_checkpointing=True,\n",[112,5462,5464],{"class":114,"line":5463},32,[112,5465,5342],{},[696,5467],{},[19,5469,5470],{"id":5470},"evaluation",[12,5472,5473],{},"i kept eval simple — mix of automated and manual:",[197,5475,5476,5490,5496,5502],{},[200,5477,5478,5481,5482,5485,5486,5489],{},[34,5479,5480],{},"static checks:"," grep for ",[98,5483,5484],{},"@triton.jit",", ",[98,5487,5488],{},"tl.load",", masked stores. fast signal.",[200,5491,5492,5495],{},[34,5493,5494],{},"token overlap:"," for translation pairs, measure how much the output matches reference.",[200,5497,5498,5501],{},[34,5499,5500],{},"spot compile:"," try compiling a few examples. catches syntax errors.",[200,5503,5504,5507],{},[34,5505,5506],{},"human read:"," i read ~100 outputs. catches weird patterns machines miss.",[696,5509],{},[19,5511,5513],{"id":5512},"example-the-model-learned","example the model learned",[12,5515,5516],{},[34,5517,5518],{},"prompt",[91,5520,5522],{"className":106,"code":5521,"language":108,"meta":100,"style":100},"import triton, triton.language as tl\n\n@triton.jit\ndef add_kernel(x_ptr, y_ptr, out_ptr, n, BLOCK_SIZE: tl.constexpr):\n    # complete: load x,y (masked), add, store\n",[98,5523,5524,5529,5533,5538,5543],{"__ignoreMap":100},[112,5525,5526],{"class":114,"line":115},[112,5527,5528],{},"import triton, triton.language as tl\n",[112,5530,5531],{"class":114,"line":121},[112,5532,131],{"emptyLinePlaceholder":130},[112,5534,5535],{"class":114,"line":127},[112,5536,5537],{},"@triton.jit\n",[112,5539,5540],{"class":114,"line":134},[112,5541,5542],{},"def add_kernel(x_ptr, y_ptr, out_ptr, n, BLOCK_SIZE: tl.constexpr):\n",[112,5544,5545],{"class":114,"line":140},[112,5546,5547],{},"    # complete: load x,y (masked), add, store\n",[12,5549,5550],{},[34,5551,5552],{},"completion",[91,5554,5556],{"className":106,"code":5555,"language":108,"meta":100,"style":100},"pid = tl.program_id(0)\noffsets = pid * BLOCK_SIZE + tl.arange(0, BLOCK_SIZE)\nmask = offsets \u003C n\nx = tl.load(x_ptr + offsets, mask=mask)\ny = tl.load(y_ptr + offsets, mask=mask)\nout = x + y\ntl.store(out_ptr + offsets, out, mask=mask)\n",[98,5557,5558,5563,5568,5573,5578,5583,5588],{"__ignoreMap":100},[112,5559,5560],{"class":114,"line":115},[112,5561,5562],{},"pid = tl.program_id(0)\n",[112,5564,5565],{"class":114,"line":121},[112,5566,5567],{},"offsets = pid * BLOCK_SIZE + tl.arange(0, BLOCK_SIZE)\n",[112,5569,5570],{"class":114,"line":127},[112,5571,5572],{},"mask = offsets \u003C n\n",[112,5574,5575],{"class":114,"line":134},[112,5576,5577],{},"x = tl.load(x_ptr + offsets, mask=mask)\n",[112,5579,5580],{"class":114,"line":140},[112,5581,5582],{},"y = tl.load(y_ptr + offsets, mask=mask)\n",[112,5584,5585],{"class":114,"line":146},[112,5586,5587],{},"out = x + y\n",[112,5589,5590],{"class":114,"line":151},[112,5591,5592],{},"tl.store(out_ptr + offsets, out, mask=mask)\n",[12,5594,5595],{},"This pattern repeats across kernels; a few dozen examples are enough for the model to reproduce it reliably.",[696,5597],{},[19,5599,5600],{"id":5600},"limitations",[197,5602,5603,5606,5609],{},[200,5604,5605],{},"small dataset: teaches idioms, not everything. expect hallucinations.",[200,5607,5608],{},"translations may not compile in edge cases. treat outputs as drafts.",[200,5610,5611],{},"no automatic proof of correctness across all outputs.",[696,5613],{},[19,5615,5617],{"id":5616},"next-steps","next steps",[197,5619,5620,5623],{},[200,5621,5622],{},"add fuzzy near‑duplicate detection (minhash) to cut semantic leakage.",[200,5624,5625],{},"run compile tests for a subset and label examples as \"compilable.\"",[696,5627],{},[12,5629,5630],{},"PS: i did all this just cause i wanted to — this was a fun project, nothing more.",[651,5632,653],{},{"title":100,"searchDepth":121,"depth":121,"links":5634},[5635,5636,5637,5638,5639,5640,5641,5642],{"id":5150,"depth":121,"text":5151},{"id":5171,"depth":121,"text":5172},{"id":5231,"depth":121,"text":5232},{"id":5290,"depth":121,"text":5290},{"id":5470,"depth":121,"text":5470},{"id":5512,"depth":121,"text":5513},{"id":5600,"depth":121,"text":5600},{"id":5616,"depth":121,"text":5617},[5644,5645,5646,5647],"Machine Learning","GPU","Triton","CUDA","2025-10-29","A practical guide to building a focused dataset and fine-tuning models to understand low-level GPU code - masked loads, boundary checks, shared memory, and CUDA→Triton translations.",{},"\u002Fblog\u002Fteaching-models-to-write-kernels",{"title":5140,"description":5649},"blog\u002Fteaching-models-to-write-kernels","MAIkJ9W_QOPCoWdg4ukOtLjeT92rpozVzYRtpOe09Tw",{"id":5656,"title":5657,"author":7,"body":5658,"categories":5816,"date":5820,"description":5821,"extension":677,"hidden":678,"meta":5822,"navigation":130,"path":5823,"seo":5824,"stem":5825,"thumbnail":683,"__hash__":5826},"blog\u002Fblog\u002Ftts-datasets.md","Building TTS Datasets That Actually Work",{"type":9,"value":5659,"toc":5801},[5660,5663,5666,5670,5673,5677,5703,5707,5710,5736,5743,5747,5751,5754,5758,5761,5765,5768,5772,5775,5779,5782,5786,5789,5793],[12,5661,5662],{},"Let's get one thing straight: your TTS model will be as good as the data you train it upon. You can use the best pre-trained SOTA models out there, but without a high-quality dataset, you can't get the model to speak naturally and lifelike.",[12,5664,5665],{},"TTS (Text-to-Speech) models require a dataset with audio files and their respective transcription. The main advantage of TTS architecture is that you don't need to align the text transcriptions to the audio because the model can grasp the alignment process.",[19,5667,5669],{"id":5668},"popular-datasets-for-tts","Popular Datasets for TTS",[12,5671,5672],{},"If you're just starting out or want to see how others have done it, here are some datasets worth looking at:",[434,5674,5676],{"id":5675},"public-datasets-everyone-uses","Public datasets everyone uses",[197,5678,5679,5687,5695],{},[200,5680,5681,5686],{},[51,5682,5685],{"href":5683,"rel":5684},"https:\u002F\u002Fkeithito.com\u002FLJ-Speech-Dataset\u002F",[55],"LJSpeech"," — single speaker, clean audio, very popular",[200,5688,5689,5694],{},[51,5690,5693],{"href":5691,"rel":5692},"https:\u002F\u002Fwww.openslr.org\u002F60\u002F",[55],"LibriTTS"," — multi-speaker dataset derived from LibriVox audiobooks",[200,5696,5697,5702],{},[51,5698,5701],{"href":5699,"rel":5700},"https:\u002F\u002Fwww.kaggle.com\u002Fdatasets\u002Fbryanpark\u002Fthe-world-english-bible-speech-dataset",[55],"TWEB"," — The World English Bible speech dataset",[434,5704,5706],{"id":5705},"my-own-datasets","My own datasets",[12,5708,5709],{},"I've also built some TTS datasets that you might find useful, especially for non-English languages or specific use cases:",[197,5711,5712,5720,5728],{},[200,5713,5714,5719],{},[51,5715,5718],{"href":5716,"rel":5717},"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fedwixx\u002FGujrati_Female_SPeech",[55],"Gujarati Female Speech"," — 8 hours of clean single-speaker Gujarati audio. Recorded in controlled conditions with aligned transcripts.",[200,5721,5722,5727],{},[51,5723,5726],{"href":5724,"rel":5725},"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fedwixx\u002Fbrazilian-portuguese-TTS",[55],"Brazilian Portuguese TTS"," — ~150 hours of multi-speaker Brazilian Portuguese, covering different accents and speaking styles, normalized and ready to train.",[200,5729,5730,5735],{},[51,5731,5734],{"href":5732,"rel":5733},"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FModelsLab\u002FObama-Sample-Dataset",[55],"Obama Voice Sample Dataset"," — 25+ minutes of Barack Obama's voice from public speeches, optimized for RVC training. Clean 24 kHz WAV files.",[12,5737,5738,5739,2252],{},"Check out more on the ",[51,5740,5742],{"href":5741},"\u002Fdatasets","datasets page",[19,5744,5746],{"id":5745},"things-to-consider-while-building-the-dataset","Things to Consider While Building the Dataset",[434,5748,5750],{"id":5749},"noise-free","Noise-free",[12,5752,5753],{},"Make sure the audio samples are noise-free. Background noise may lead your model not to learn well, and ultimately won't be able to learn good alignment. Even if it learns alignment, the final output will be much worse than anticipated.",[434,5755,5757],{"id":5756},"consistency","Consistency",[12,5759,5760],{},"Audio samples within your dataset should have the same format (mp3, flac, opus, wav) and sampling rate — ideally between 16kHz–22kHz. If you have high-quality audio with a higher sampling rate, normalize the configs.",[434,5762,5764],{"id":5763},"naturalness","Naturalness",[12,5766,5767],{},"Your model will learn what samples you feed into it. If you expect a natural-sounding voice with speed, pitch, and intonation differences, the dataset should accommodate the same.",[434,5769,5771],{"id":5770},"diverse-phonemes","Diverse Phonemes",[12,5773,5774],{},"Make sure your dataset covers a good set of phonemes for your use case. If phoneme coverage is low, the model will struggle to pronounce certain words.",[434,5776,5778],{"id":5777},"correctness","Correctness",[12,5780,5781],{},"Before training, filter out bad-quality transcripts, compare transcript and audio lengths, and remove wrong or broken files.",[434,5783,5785],{"id":5784},"clip-length-distribution","Clip Length Distribution",[12,5787,5788],{},"Verify the distribution of clip lengths and make sure your dataset has sufficient short and long audio clips. At maximum, use 30-second clips based on your available compute.",[19,5790,5792],{"id":5791},"additional-quality-checks","Additional Quality Checks",[197,5794,5795,5798],{},[200,5796,5797],{},"Check the spectrogram of audio files to measure noise levels. If the spectrogram looks cluttered in silent parts, the dataset might not be good for training.",[200,5799,5800],{},"Analyze the dataset distribution in terms of clip and transcript lengths. Watch for outlier cases — a very long clip with short text, or a short clip with very long text, which can happen when using models like Whisper for transcription.",{"title":100,"searchDepth":121,"depth":121,"links":5802},[5803,5807,5815],{"id":5668,"depth":121,"text":5669,"children":5804},[5805,5806],{"id":5675,"depth":127,"text":5676},{"id":5705,"depth":127,"text":5706},{"id":5745,"depth":121,"text":5746,"children":5808},[5809,5810,5811,5812,5813,5814],{"id":5749,"depth":127,"text":5750},{"id":5756,"depth":127,"text":5757},{"id":5763,"depth":127,"text":5764},{"id":5770,"depth":127,"text":5771},{"id":5777,"depth":127,"text":5778},{"id":5784,"depth":127,"text":5785},{"id":5791,"depth":121,"text":5792},[5817,5818,5819],"Datasets","TTS","Data","2025-06-11","your model is only as good as your data - here's what you need to know about building high-quality TTS datasets",{},"\u002Fblog\u002Ftts-datasets",{"title":5657,"description":5821},"blog\u002Ftts-datasets","Bbre6EnGJkWA36EMCirPZFmevLs3qMulnrPBbqtFNNI",{"id":5828,"title":5829,"author":7,"body":5830,"categories":6110,"date":6113,"description":6114,"extension":677,"hidden":678,"meta":6115,"navigation":130,"path":6116,"seo":6117,"stem":6118,"thumbnail":683,"__hash__":6119},"blog\u002Fblog\u002Fgpu-architecture-notes.md","How the RTX 3090 Actually Works: GPU Architecture notes...",{"type":9,"value":5831,"toc":6091},[5832,5844,5848,5855,5859,5862,5870,5876,5885,5891,5913,5917,5920,5940,5944,5947,5964,5968,5971,5976,5980,5984,5987,5991,5994,5998,6001,6005,6009,6016,6020,6023,6027,6030,6034,6037,6084],[12,5833,5834,5835,5840,5841,2252],{},"I spent some time watching ",[51,5836,5839],{"href":5837,"rel":5838},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=example",[55],"Branch Education's video"," on how GPUs work, specifically the RTX 3090, and took detailed notes. Figured I'd clean them up and share what I learned about the ",[34,5842,5843],{},"GA102 architecture",[19,5845,5847],{"id":5846},"the-hardware-breakdown","The Hardware Breakdown",[12,5849,5850,5851,5854],{},"We're looking at ",[34,5852,5853],{},"GA102",", which is the 3090's GPU processor architecture.",[434,5856,5858],{"id":5857},"the-hierarchy","The Hierarchy",[12,5860,5861],{},"The architecture is organized in layers:",[332,5863,5864],{},[200,5865,5866,5869],{},[34,5867,5868],{},"7 GPCs"," (Graphics Processing Clusters) at the top level",[12,5871,5872],{},[212,5873],{"alt":5874,"src":5875},"GPU architecture showing the 7 GPCs","https:\u002F\u002Flearnopencv.com\u002Fwp-content\u002Fuploads\u002F2025\u002F05\u002FGraphics-Processing-Clusters.png",[332,5877,5878],{"start":121},[200,5879,5880,5881,5884],{},"Within each GPC, there are ",[34,5882,5883],{},"12 SMs"," (Streaming Multiprocessors)",[12,5886,5887],{},[212,5888],{"alt":5889,"src":5890},"Internal structure of a Streaming Multiprocessor (SM)","https:\u002F\u002Flearnopencv.com\u002Fwp-content\u002Fuploads\u002F2025\u002F05\u002FStreaming-Multiprocessors.png",[332,5892,5893,5903],{"start":127},[200,5894,5895,5896,5899,5900],{},"Inside each SM, there are ",[34,5897,5898],{},"4 warp schedulers"," and ",[34,5901,5902],{},"1 Ray Tracing core",[200,5904,5905,5906,5909,5910],{},"Inside each warp, there are ",[34,5907,5908],{},"32 CUDA cores"," (shading cores) and ",[34,5911,5912],{},"1 Tensor core",[434,5914,5916],{"id":5915},"total-core-count","Total Core Count",[12,5918,5919],{},"Across the entire GPU:",[197,5921,5922,5928,5934],{},[200,5923,5924,5927],{},[34,5925,5926],{},"10,752"," CUDA cores",[200,5929,5930,5933],{},[34,5931,5932],{},"336"," Tensor cores",[200,5935,5936,5939],{},[34,5937,5938],{},"84"," Ray Tracing cores",[434,5941,5943],{"id":5942},"around-the-edge","Around the Edge",[12,5945,5946],{},"The chip's periphery includes:",[197,5948,5949,5952,5955,5958,5961],{},[200,5950,5951],{},"12 graphics memory controllers",[200,5953,5954],{},"NVLink controllers",[200,5956,5957],{},"PCIe interface",[200,5959,5960],{},"6MB Level 2 SRAM cache at the bottom",[200,5962,5963],{},"Gigathread Engine that manages all 7 GPCs and the streaming multiprocessors inside",[434,5965,5967],{"id":5966},"inside-each-sm","Inside Each SM",[12,5969,5970],{},"Each streaming multiprocessor contains:",[197,5972,5973],{},[200,5974,5975],{},"128KB of L1 cache\u002Fshared memory (configurable split)",[19,5977,5979],{"id":5978},"what-each-core-does","What Each Core Does",[434,5981,5983],{"id":5982},"cuda-cores","CUDA Cores",[12,5985,5986],{},"Can be thought of as simple binary calculators - they handle addition, multiplication, and a few other basic operations.",[434,5988,5990],{"id":5989},"tensor-cores","Tensor Cores",[12,5992,5993],{},"Matrix multiplication and addition calculators. They're used the most when working with geometrical transformations and neural networks.",[434,5995,5997],{"id":5996},"ray-tracing-cores","Ray Tracing Cores",[12,5999,6000],{},"The fewest and the largest cores. They're specially designed for ray tracing algorithms.",[19,6002,6004],{"id":6003},"key-terminologies","Key Terminologies",[434,6006,6008],{"id":6007},"fma-fused-multiply-add","FMA (Fused Multiply-Add)",[12,6010,6011,6012,6015],{},"The operation ",[98,6013,6014],{},"A × B + C",". This is a fundamental calculation that gets used constantly in GPU operations.",[434,6017,6019],{"id":6018},"simd-single-instruction-multiple-data","SIMD (Single Instruction, Multiple Data)",[12,6021,6022],{},"GPUs solve embarrassingly parallel problems using SIMD - applying one instruction to multiple data points simultaneously.",[434,6024,6026],{"id":6025},"simt-single-instruction-multiple-threads","SIMT (Single Instruction, Multiple Threads)",[12,6028,6029],{},"Basically SIMD but adds a program counter, which avoids conflicts from dependency and branching of operations.",[19,6031,6033],{"id":6032},"computational-architecture-physical-hardware","Computational Architecture → Physical Hardware",[12,6035,6036],{},"Now that we understand how SIMD\u002FSIMT works, here's how the computational architecture maps to the physical hardware:",[197,6038,6039,6045,6053,6059,6062,6075],{},[200,6040,6041,6042],{},"Each instruction is completed by a ",[34,6043,6044],{},"thread",[200,6046,192,6047,6049,6050],{},[34,6048,6044],{}," is paired with a ",[34,6051,6052],{},"CUDA core",[200,6054,6055,6056],{},"Threads are bundled into groups of 32 called ",[34,6057,6058],{},"warps",[200,6060,6061],{},"The same sequence of instructions are issued to all threads in a warp",[200,6063,6064,6067,6068,6071,6072],{},[34,6065,6066],{},"Warps"," are grouped into ",[34,6069,6070],{},"thread blocks",", which are handled by a ",[34,6073,6074],{},"Streaming Multiprocessor (SM)",[200,6076,6077,6067,6080,6083],{},[34,6078,6079],{},"Thread blocks",[34,6081,6082],{},"grids",", which are computed across the entire GPU",[12,6085,6086,6087,6090],{},"All these operations are managed and scheduled by the ",[34,6088,6089],{},"Gigathread Engine",", which maps the available thread blocks to the streaming multiprocessors.",{"title":100,"searchDepth":121,"depth":121,"links":6092},[6093,6099,6104,6109],{"id":5846,"depth":121,"text":5847,"children":6094},[6095,6096,6097,6098],{"id":5857,"depth":127,"text":5858},{"id":5915,"depth":127,"text":5916},{"id":5942,"depth":127,"text":5943},{"id":5966,"depth":127,"text":5967},{"id":5978,"depth":121,"text":5979,"children":6100},[6101,6102,6103],{"id":5982,"depth":127,"text":5983},{"id":5989,"depth":127,"text":5990},{"id":5996,"depth":127,"text":5997},{"id":6003,"depth":121,"text":6004,"children":6105},[6106,6107,6108],{"id":6007,"depth":127,"text":6008},{"id":6018,"depth":127,"text":6019},{"id":6025,"depth":127,"text":6026},{"id":6032,"depth":121,"text":6033},[6111,5645,6112],"Hardware","Computer Architecture","2025-01-05","Breaking down the GA102 architecture - from GPCs and streaming multiprocessors to CUDA cores and ray tracing units.",{},"\u002Fblog\u002Fgpu-architecture-notes",{"title":5829,"description":6114},"blog\u002Fgpu-architecture-notes","0uX3CWNMeYcu4mSbeptZAMnfujlBOKFEhMugJ-LB0sg",1775296369668]