[{"data":1,"prerenderedAt":4502},["ShallowReactive",2],{"blog-neural-audio-codec-rvq":3},{"id":4,"title":5,"author":6,"body":7,"categories":4489,"date":4492,"description":4493,"extension":4494,"hidden":4495,"meta":4496,"navigation":417,"path":4497,"seo":4498,"stem":4499,"thumbnail":4500,"__hash__":4501},"blog\u002Fblog\u002Fneural-audio-codec-rvq.md","Vector Quantization: The Mathematical Art of Audio Compression","Anurag Kanade",{"type":8,"value":9,"toc":4474},"minimark",[10,17,20,25,28,31,38,43,47,58,64,69,79,82,84,88,91,94,97,392,395,524,527,531,534,626,629,634,638,972,1312,1315,1546,1551,1909,1911,1916,1920,1926,1931,1934,1937,2461,2464,3016,3019,3308,3311,3384,3388,3393,3400,3405,3410,3415,3420,3425,3430,3435,3440,3444,3450,3455,3458,3461,3467,3472,3475,3582,3644,3648,3651,4013,4015,4359,4393,4395,4399,4470],[11,12,13],"p",{},[14,15,16],"em",{},"Every sound is data of millions of samples every second. Compressing all that without losing clarity has always been the challenge. Now imagine if a model could learn what truly matters in those waves and ignore the rest. That idea, called vector quantization, reshaped how modern AI handles voice and music.",[18,19],"hr",{},[21,22,24],"h2",{"id":23},"the-challenge-behind-modern-audio-compression","The Challenge Behind Modern Audio Compression",[11,26,27],{},"Modern voice AI systems face a big challenge: every second of CD-quality audio produces about 1.4 million data points. Multiply that by millions of users, and storage and transmission quickly become expensive. Earlier compression techniques such as MP3, AAC, and Opus helped, but each involved trade offs reducing bandwidth at the cost of quality or latency.",[11,29,30],{},"A simpler idea was treating sounds as a continuous stream of data points. But what if we could represent these sounds more efficiently?",[11,32,33],{},[34,35],"img",{"alt":36,"src":37},"Continuous vs. Discrete Signal Representation","\u002Fcontinousvsdigital.png",[11,39,40],{},[14,41,42],{},"Figure 1: Continuous vs. Discrete Signal Representation",[21,44,46],{"id":45},"understanding-quantization","Understanding Quantization",[11,48,49,50,57],{},"Before we jump into how audio uses quantization, it helps to understand what quantization means in machine learning. In Machine learning, ",[51,52,56],"a",{"href":53,"rel":54},"https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FQuantization",[55],"nofollow","quantization"," refers to reducing the precision of numbers used to represent model parameters or activations like converting 32-bit floating points to 8-bit integers.",[11,59,60],{},[34,61],{"alt":62,"src":63},"32-bit Float to 8-bit Integer Quantization","\u002F8bitint-quantization.png",[11,65,66],{},[14,67,68],{},"Figure 2: 32-bit Float to 8-bit Integer Quantization",[70,71,72],"blockquote",{},[11,73,74,78],{},[75,76,77],"strong",{},"NOTE:"," Quantization is different from compression. Compression reduces the size of data by encoding it more efficiently, while quantization reduces the precision of data representation.",[11,80,81],{},"This makes models faster and lighter, but it is a win win game if we are constrained on a limited amount of compute by sacrificing a small amount of accuracy for big efficiency gains. While this approach works well for neural networks, audio data has unique properties that require a more sophisticated strategy—one that can capture the complex patterns hidden in sound waves.",[18,83],{},[21,85,87],{"id":86},"vector-quantization-through-speech","Vector Quantization Through Speech",[11,89,90],{},"Enter vector quantization, a technique that transforms high-dimensional audio data into compact representations without significant loss of quality. Vector quantization (VQ) exploits the fact that variation in natural data is redundant. When you hear someone say \"hello\", your brain doesn't process every microscopic detail of the sound wave. Instead, it extracts key features and matches them against learned patterns.",[11,92,93],{},"Let's break down how VQ works mathematically.",[11,95,96],{},"Given input vector x ∈ ℝⁿ, find codebook entry cᵢ that minimizes:",[11,98,99],{},[100,101,104,182],"span",{"className":102},[103],"katex",[100,105,108],{"className":106},[107],"katex-mathml",[109,110,112],"math",{"xmlns":111},"http:\u002F\u002Fwww.w3.org\u002F1998\u002FMath\u002FMathML",[113,114,115,177],"semantics",{},[116,117,118,122,127,130,134,143,146,149,153,155,157,160,166,168],"mrow",{},[119,120,121],"mi",{},"d",[123,124,126],"mo",{"stretchy":125},"false","(",[119,128,129],{},"x",[123,131,133],{"separator":132},"true",",",[135,136,137,140],"msub",{},[119,138,139],{},"c",[119,141,142],{},"i",[123,144,145],{"stretchy":125},")",[123,147,148],{},"=",[119,150,152],{"mathvariant":151},"normal","∣",[119,154,152],{"mathvariant":151},[119,156,129],{},[123,158,159],{},"−",[135,161,162,164],{},[119,163,139],{},[119,165,142],{},[119,167,152],{"mathvariant":151},[169,170,171,173],"msup",{},[119,172,152],{"mathvariant":151},[174,175,176],"mn",{},"2",[178,179,181],"annotation",{"encoding":180},"application\u002Fx-tex","d(x, c_i) = ||x - c_i||^2",[100,183,186,287,311],{"className":184,"ariaHidden":132},[185],"katex-html",[100,187,190,195,200,204,207,211,216,272,276,280,284],{"className":188},[189],"base",[100,191],{"className":192,"style":194},[193],"strut","height:1em;vertical-align:-0.25em;",[100,196,121],{"className":197},[198,199],"mord","mathnormal",[100,201,126],{"className":202},[203],"mopen",[100,205,129],{"className":206},[198,199],[100,208,133],{"className":209},[210],"mpunct",[100,212],{"className":213,"style":215},[214],"mspace","margin-right:0.1667em;",[100,217,219,222],{"className":218},[198],[100,220,139],{"className":221},[198,199],[100,223,226],{"className":224},[225],"msupsub",[100,227,231,263],{"className":228},[229,230],"vlist-t","vlist-t2",[100,232,235,258],{"className":233},[234],"vlist-r",[100,236,240],{"className":237,"style":239},[238],"vlist","height:0.3117em;",[100,241,243,248],{"style":242},"top:-2.55em;margin-left:0em;margin-right:0.05em;",[100,244],{"className":245,"style":247},[246],"pstrut","height:2.7em;",[100,249,255],{"className":250},[251,252,253,254],"sizing","reset-size6","size3","mtight",[100,256,142],{"className":257},[198,199,254],[100,259,262],{"className":260},[261],"vlist-s","​",[100,264,266],{"className":265},[234],[100,267,270],{"className":268,"style":269},[238],"height:0.15em;",[100,271],{},[100,273,145],{"className":274},[275],"mclose",[100,277],{"className":278,"style":279},[214],"margin-right:0.2778em;",[100,281,148],{"className":282},[283],"mrel",[100,285],{"className":286,"style":279},[214],[100,288,290,293,297,300,304,308],{"className":289},[189],[100,291],{"className":292,"style":194},[193],[100,294,296],{"className":295},[198],"∣∣",[100,298,129],{"className":299},[198,199],[100,301],{"className":302,"style":303},[214],"margin-right:0.2222em;",[100,305,159],{"className":306},[307],"mbin",[100,309],{"className":310,"style":303},[214],[100,312,314,318,358,361],{"className":313},[189],[100,315],{"className":316,"style":317},[193],"height:1.0641em;vertical-align:-0.25em;",[100,319,321,324],{"className":320},[198],[100,322,139],{"className":323},[198,199],[100,325,327],{"className":326},[225],[100,328,330,350],{"className":329},[229,230],[100,331,333,347],{"className":332},[234],[100,334,336],{"className":335,"style":239},[238],[100,337,338,341],{"style":242},[100,339],{"className":340,"style":247},[246],[100,342,344],{"className":343},[251,252,253,254],[100,345,142],{"className":346},[198,199,254],[100,348,262],{"className":349},[261],[100,351,353],{"className":352},[234],[100,354,356],{"className":355,"style":269},[238],[100,357],{},[100,359,152],{"className":360},[198],[100,362,364,367],{"className":363},[198],[100,365,152],{"className":366},[198],[100,368,370],{"className":369},[225],[100,371,373],{"className":372},[229],[100,374,376],{"className":375},[234],[100,377,380],{"className":378,"style":379},[238],"height:0.8141em;",[100,381,383,386],{"style":382},"top:-3.063em;margin-right:0.05em;",[100,384],{"className":385,"style":247},[246],[100,387,389],{"className":388},[251,252,253,254],[100,390,176],{"className":391},[198,254],[11,393,394],{},"Let's say we have a 256-dimensional vector representing a short audio segment. Instead of storing all 256 values, we can use VQ to find the closest match from a learned codebook of common speech patterns.",[396,397,402],"pre",{"className":398,"code":399,"language":400,"meta":401,"style":401},"language-python shiki shiki-themes github-light github-dark","import numpy as np\n\n# Audio segment encoded as 256-dimensional vector\naudio_vector = np.array([0.23, -0.41, 0.67, -0.12, ...])  # 256 values\n\n# Learned codebook representing common speech patterns\ncodebook = np.array([\n    [0.25, -0.40, 0.65, -0.10, ...],  # maybe a fricative sound\n    [0.15, 0.32, -0.21, 0.45, ...],   # maybe a vowel sound\n    [0.67, -0.23, 0.12, 0.89, ...],   # maybe a plosive sound\n])\n\ndef quantize_vector(input_vec, codebook):\n    \"\"\"Find closest codebook match using L2 distance\"\"\"\n    distances = np.linalg.norm(codebook - input_vec, axis=1)\n    best_index = np.argmin(distances)\n    return codebook[best_index], best_index\n\nquantized_vec, index = quantize_vector(audio_vector, codebook)\n# Store index (small integer) instead of 256 floats\n","python","",[403,404,405,412,419,425,431,436,442,448,454,460,466,472,477,483,489,495,501,507,512,518],"code",{"__ignoreMap":401},[100,406,409],{"class":407,"line":408},"line",1,[100,410,411],{},"import numpy as np\n",[100,413,415],{"class":407,"line":414},2,[100,416,418],{"emptyLinePlaceholder":417},true,"\n",[100,420,422],{"class":407,"line":421},3,[100,423,424],{},"# Audio segment encoded as 256-dimensional vector\n",[100,426,428],{"class":407,"line":427},4,[100,429,430],{},"audio_vector = np.array([0.23, -0.41, 0.67, -0.12, ...])  # 256 values\n",[100,432,434],{"class":407,"line":433},5,[100,435,418],{"emptyLinePlaceholder":417},[100,437,439],{"class":407,"line":438},6,[100,440,441],{},"# Learned codebook representing common speech patterns\n",[100,443,445],{"class":407,"line":444},7,[100,446,447],{},"codebook = np.array([\n",[100,449,451],{"class":407,"line":450},8,[100,452,453],{},"    [0.25, -0.40, 0.65, -0.10, ...],  # maybe a fricative sound\n",[100,455,457],{"class":407,"line":456},9,[100,458,459],{},"    [0.15, 0.32, -0.21, 0.45, ...],   # maybe a vowel sound\n",[100,461,463],{"class":407,"line":462},10,[100,464,465],{},"    [0.67, -0.23, 0.12, 0.89, ...],   # maybe a plosive sound\n",[100,467,469],{"class":407,"line":468},11,[100,470,471],{},"])\n",[100,473,475],{"class":407,"line":474},12,[100,476,418],{"emptyLinePlaceholder":417},[100,478,480],{"class":407,"line":479},13,[100,481,482],{},"def quantize_vector(input_vec, codebook):\n",[100,484,486],{"class":407,"line":485},14,[100,487,488],{},"    \"\"\"Find closest codebook match using L2 distance\"\"\"\n",[100,490,492],{"class":407,"line":491},15,[100,493,494],{},"    distances = np.linalg.norm(codebook - input_vec, axis=1)\n",[100,496,498],{"class":407,"line":497},16,[100,499,500],{},"    best_index = np.argmin(distances)\n",[100,502,504],{"class":407,"line":503},17,[100,505,506],{},"    return codebook[best_index], best_index\n",[100,508,510],{"class":407,"line":509},18,[100,511,418],{"emptyLinePlaceholder":417},[100,513,515],{"class":407,"line":514},19,[100,516,517],{},"quantized_vec, index = quantize_vector(audio_vector, codebook)\n",[100,519,521],{"class":407,"line":520},20,[100,522,523],{},"# Store index (small integer) instead of 256 floats\n",[11,525,526],{},"By storing just the index of the closest codebook entry, we drastically reduce the amount of data needed to represent the audio segment.",[21,528,530],{"id":529},"how-the-codebook-is-learned","How the Codebook is Learned",[11,532,533],{},"Traditional approaches used k-means clustering to discover representative patterns:",[396,535,537],{"className":398,"code":536,"language":400,"meta":401,"style":401},"def learn_codebook_kmeans(training_data, k=1024):\n    # Initialize random centroids\n    centroids = np.random.randn(k, vector_dim)\n\n    for iteration in range(max_iters):\n        # Assign each vector to nearest centroid\n        assignments = []\n        for vec in training_data:\n            distances = np.linalg.norm(centroids - vec, axis=1)\n            assignments.append(np.argmin(distances))\n\n        # Update centroids as cluster means\n        for i in range(k):\n            cluster_vecs = training_data[np.array(assignments) == i]\n            if len(cluster_vecs) > 0:\n                centroids[i] = np.mean(cluster_vecs, axis=0)\n\n    return centroids\n",[403,538,539,544,549,554,558,563,568,573,578,583,588,592,597,602,607,612,617,621],{"__ignoreMap":401},[100,540,541],{"class":407,"line":408},[100,542,543],{},"def learn_codebook_kmeans(training_data, k=1024):\n",[100,545,546],{"class":407,"line":414},[100,547,548],{},"    # Initialize random centroids\n",[100,550,551],{"class":407,"line":421},[100,552,553],{},"    centroids = np.random.randn(k, vector_dim)\n",[100,555,556],{"class":407,"line":427},[100,557,418],{"emptyLinePlaceholder":417},[100,559,560],{"class":407,"line":433},[100,561,562],{},"    for iteration in range(max_iters):\n",[100,564,565],{"class":407,"line":438},[100,566,567],{},"        # Assign each vector to nearest centroid\n",[100,569,570],{"class":407,"line":444},[100,571,572],{},"        assignments = []\n",[100,574,575],{"class":407,"line":450},[100,576,577],{},"        for vec in training_data:\n",[100,579,580],{"class":407,"line":456},[100,581,582],{},"            distances = np.linalg.norm(centroids - vec, axis=1)\n",[100,584,585],{"class":407,"line":462},[100,586,587],{},"            assignments.append(np.argmin(distances))\n",[100,589,590],{"class":407,"line":468},[100,591,418],{"emptyLinePlaceholder":417},[100,593,594],{"class":407,"line":474},[100,595,596],{},"        # Update centroids as cluster means\n",[100,598,599],{"class":407,"line":479},[100,600,601],{},"        for i in range(k):\n",[100,603,604],{"class":407,"line":485},[100,605,606],{},"            cluster_vecs = training_data[np.array(assignments) == i]\n",[100,608,609],{"class":407,"line":491},[100,610,611],{},"            if len(cluster_vecs) > 0:\n",[100,613,614],{"class":407,"line":497},[100,615,616],{},"                centroids[i] = np.mean(cluster_vecs, axis=0)\n",[100,618,619],{"class":407,"line":503},[100,620,418],{"emptyLinePlaceholder":417},[100,622,623],{"class":407,"line":509},[100,624,625],{},"    return centroids\n",[11,627,628],{},"Newer codecs use more sophisticated methods like VQ-VAE to jointly learn the codebook and the encoder-decoder architecture.",[11,630,631],{},[14,632,633],{},"This worked for offline processing but had serious limitations for neural network training. The discrete assignment steps and batch processing requirements made gradient-based optimization difficult.",[21,635,637],{"id":636},"limitations-of-traditional-vector-quantization","Limitations of Traditional Vector Quantization",[11,639,640,641,730,731,971],{},"Let the input be a feature vector ",[100,642,644,670],{"className":643},[103],[100,645,647],{"className":646},[107],[109,648,649],{"xmlns":111},[113,650,651,667],{},[116,652,653,656,659],{},[119,654,129],{"mathvariant":655},"bold",[123,657,658],{},"∈",[169,660,661,665],{},[119,662,664],{"mathvariant":663},"double-struck","R",[119,666,121],{},[178,668,669],{"encoding":180},"\\mathbf{x} \\in \\mathbb{R}^d",[100,671,673,693],{"className":672,"ariaHidden":132},[185],[100,674,676,680,684,687,690],{"className":675},[189],[100,677],{"className":678,"style":679},[193],"height:0.5782em;vertical-align:-0.0391em;",[100,681,129],{"className":682},[198,683],"mathbf",[100,685],{"className":686,"style":279},[214],[100,688,658],{"className":689},[283],[100,691],{"className":692,"style":279},[214],[100,694,696,700],{"className":695},[189],[100,697],{"className":698,"style":699},[193],"height:0.8491em;",[100,701,703,707],{"className":702},[198],[100,704,664],{"className":705},[198,706],"mathbb",[100,708,710],{"className":709},[225],[100,711,713],{"className":712},[229],[100,714,716],{"className":715},[234],[100,717,719],{"className":718,"style":699},[238],[100,720,721,724],{"style":382},[100,722],{"className":723,"style":247},[246],[100,725,727],{"className":726},[251,252,253,254],[100,728,121],{"className":729},[198,199,254]," and a finite codebook ",[100,732,734,787],{"className":733},[103],[100,735,737],{"className":736},[107],[109,738,739],{"xmlns":111},[113,740,741,784],{},[116,742,743,747,749,752,759,761,767,769,772,774,781],{},[119,744,746],{"mathvariant":745},"script","C",[123,748,148],{},[123,750,751],{"stretchy":125},"{",[135,753,754,756],{},[119,755,139],{"mathvariant":655},[174,757,758],{},"1",[123,760,133],{"separator":132},[135,762,763,765],{},[119,764,139],{"mathvariant":655},[174,766,176],{},[123,768,133],{"separator":132},[123,770,771],{},"…",[123,773,133],{"separator":132},[135,775,776,778],{},[119,777,139],{"mathvariant":655},[119,779,780],{},"K",[123,782,783],{"stretchy":125},"}",[178,785,786],{"encoding":180},"\\mathcal{C} = \\lbrace \\mathbf{c}_1, \\mathbf{c}_2, \\dots, \\mathbf{c}_K \\rbrace",[100,788,790,811],{"className":789,"ariaHidden":132},[185],[100,791,793,797,802,805,808],{"className":792},[189],[100,794],{"className":795,"style":796},[193],"height:0.6833em;",[100,798,746],{"className":799,"style":801},[198,800],"mathcal","margin-right:0.0583em;",[100,803],{"className":804,"style":279},[214],[100,806,148],{"className":807},[283],[100,809],{"className":810,"style":279},[214],[100,812,814,817,820,861,864,867,907,910,913,917,920,923,926,968],{"className":813},[189],[100,815],{"className":816,"style":194},[193],[100,818,751],{"className":819},[203],[100,821,823,826],{"className":822},[198],[100,824,139],{"className":825},[198,683],[100,827,829],{"className":828},[225],[100,830,832,853],{"className":831},[229,230],[100,833,835,850],{"className":834},[234],[100,836,839],{"className":837,"style":838},[238],"height:0.3011em;",[100,840,841,844],{"style":242},[100,842],{"className":843,"style":247},[246],[100,845,847],{"className":846},[251,252,253,254],[100,848,758],{"className":849},[198,254],[100,851,262],{"className":852},[261],[100,854,856],{"className":855},[234],[100,857,859],{"className":858,"style":269},[238],[100,860],{},[100,862,133],{"className":863},[210],[100,865],{"className":866,"style":215},[214],[100,868,870,873],{"className":869},[198],[100,871,139],{"className":872},[198,683],[100,874,876],{"className":875},[225],[100,877,879,899],{"className":878},[229,230],[100,880,882,896],{"className":881},[234],[100,883,885],{"className":884,"style":838},[238],[100,886,887,890],{"style":242},[100,888],{"className":889,"style":247},[246],[100,891,893],{"className":892},[251,252,253,254],[100,894,176],{"className":895},[198,254],[100,897,262],{"className":898},[261],[100,900,902],{"className":901},[234],[100,903,905],{"className":904,"style":269},[238],[100,906],{},[100,908,133],{"className":909},[210],[100,911],{"className":912,"style":215},[214],[100,914,771],{"className":915},[916],"minner",[100,918],{"className":919,"style":215},[214],[100,921,133],{"className":922},[210],[100,924],{"className":925,"style":215},[214],[100,927,929,932],{"className":928},[198],[100,930,139],{"className":931},[198,683],[100,933,935],{"className":934},[225],[100,936,938,960],{"className":937},[229,230],[100,939,941,957],{"className":940},[234],[100,942,945],{"className":943,"style":944},[238],"height:0.3283em;",[100,946,947,950],{"style":242},[100,948],{"className":949,"style":247},[246],[100,951,953],{"className":952},[251,252,253,254],[100,954,780],{"className":955,"style":956},[198,199,254],"margin-right:0.0715em;",[100,958,262],{"className":959},[261],[100,961,963],{"className":962},[234],[100,964,966],{"className":965,"style":269},[238],[100,967],{},[100,969,783],{"className":970},[275],".\nVector quantization replaces each input with its nearest codeword:",[11,973,974],{},[100,975,977,1050],{"className":976},[103],[100,978,980],{"className":979},[107],[109,981,982],{"xmlns":111},[113,983,984,1047],{},[116,985,986,989,991,993,995,997,1000,1003,1025,1028,1030,1032,1038],{},[119,987,988],{},"Q",[123,990,126],{"stretchy":125},[119,992,129],{"mathvariant":655},[123,994,145],{"stretchy":125},[123,996,148],{},[119,998,999],{},"arg",[123,1001,1002],{},"⁡",[135,1004,1005,1012],{},[116,1006,1007,1010],{},[119,1008,1009],{},"min",[123,1011,1002],{},[116,1013,1014,1021,1023],{},[135,1015,1016,1018],{},[119,1017,139],{"mathvariant":655},[119,1019,1020],{},"k",[123,1022,658],{},[119,1024,746],{"mathvariant":745},[119,1026,1027],{"mathvariant":151},"∥",[119,1029,129],{"mathvariant":655},[123,1031,159],{},[135,1033,1034,1036],{},[119,1035,139],{"mathvariant":655},[119,1037,1020],{},[1039,1040,1041,1043,1045],"msubsup",{},[119,1042,1027],{"mathvariant":151},[174,1044,176],{},[174,1046,176],{},[178,1048,1049],{"encoding":180},"Q(\\mathbf{x}) = \\arg\\min_{\\mathbf{c}_k \\in \\mathcal{C}} \\|\\mathbf{x} - \\mathbf{c}_k\\|_2^2",[100,1051,1053,1080,1212],{"className":1052,"ariaHidden":132},[185],[100,1054,1056,1059,1062,1065,1068,1071,1074,1077],{"className":1055},[189],[100,1057],{"className":1058,"style":194},[193],[100,1060,988],{"className":1061},[198,199],[100,1063,126],{"className":1064},[203],[100,1066,129],{"className":1067},[198,683],[100,1069,145],{"className":1070},[275],[100,1072],{"className":1073,"style":279},[214],[100,1075,148],{"className":1076},[283],[100,1078],{"className":1079,"style":279},[214],[100,1081,1083,1087,1096,1099,1194,1197,1200,1203,1206,1209],{"className":1082},[189],[100,1084],{"className":1085,"style":1086},[193],"height:1.0059em;vertical-align:-0.2559em;",[100,1088,1091,1092],{"className":1089},[1090],"mop","ar",[100,1093,1095],{"style":1094},"margin-right:0.0139em;","g",[100,1097],{"className":1098,"style":215},[214],[100,1100,1102,1105],{"className":1101},[1090],[100,1103,1009],{"className":1104},[1090],[100,1106,1108],{"className":1107},[225],[100,1109,1111,1185],{"className":1110},[229,230],[100,1112,1114,1182],{"className":1113},[234],[100,1115,1117],{"className":1116,"style":944},[238],[100,1118,1120,1123],{"style":1119},"top:-2.55em;margin-right:0.05em;",[100,1121],{"className":1122,"style":247},[246],[100,1124,1126],{"className":1125},[251,252,253,254],[100,1127,1129,1176,1179],{"className":1128},[198,254],[100,1130,1132,1135],{"className":1131},[198,254],[100,1133,139],{"className":1134},[198,683,254],[100,1136,1138],{"className":1137},[225],[100,1139,1141,1167],{"className":1140},[229,230],[100,1142,1144,1164],{"className":1143},[234],[100,1145,1148],{"className":1146,"style":1147},[238],"height:0.3448em;",[100,1149,1151,1155],{"style":1150},"top:-2.3488em;margin-left:0em;margin-right:0.0714em;",[100,1152],{"className":1153,"style":1154},[246],"height:2.5em;",[100,1156,1160],{"className":1157},[251,1158,1159,254],"reset-size3","size1",[100,1161,1020],{"className":1162,"style":1163},[198,199,254],"margin-right:0.0315em;",[100,1165,262],{"className":1166},[261],[100,1168,1170],{"className":1169},[234],[100,1171,1174],{"className":1172,"style":1173},[238],"height:0.1512em;",[100,1175],{},[100,1177,658],{"className":1178},[283,254],[100,1180,746],{"className":1181,"style":801},[198,800,254],[100,1183,262],{"className":1184},[261],[100,1186,1188],{"className":1187},[234],[100,1189,1192],{"className":1190,"style":1191},[238],"height:0.2559em;",[100,1193],{},[100,1195],{"className":1196,"style":215},[214],[100,1198,1027],{"className":1199},[198],[100,1201,129],{"className":1202},[198,683],[100,1204],{"className":1205,"style":303},[214],[100,1207,159],{"className":1208},[307],[100,1210],{"className":1211,"style":303},[214],[100,1213,1215,1218,1259],{"className":1214},[189],[100,1216],{"className":1217,"style":317},[193],[100,1219,1221,1224],{"className":1220},[198],[100,1222,139],{"className":1223},[198,683],[100,1225,1227],{"className":1226},[225],[100,1228,1230,1251],{"className":1229},[229,230],[100,1231,1233,1248],{"className":1232},[234],[100,1234,1237],{"className":1235,"style":1236},[238],"height:0.3361em;",[100,1238,1239,1242],{"style":242},[100,1240],{"className":1241,"style":247},[246],[100,1243,1245],{"className":1244},[251,252,253,254],[100,1246,1020],{"className":1247,"style":1163},[198,199,254],[100,1249,262],{"className":1250},[261],[100,1252,1254],{"className":1253},[234],[100,1255,1257],{"className":1256,"style":269},[238],[100,1258],{},[100,1260,1262,1265],{"className":1261},[198],[100,1263,1027],{"className":1264},[198],[100,1266,1268],{"className":1267},[225],[100,1269,1271,1303],{"className":1270},[229,230],[100,1272,1274,1300],{"className":1273},[234],[100,1275,1277,1289],{"className":1276,"style":379},[238],[100,1278,1280,1283],{"style":1279},"top:-2.4519em;margin-left:0em;margin-right:0.05em;",[100,1281],{"className":1282,"style":247},[246],[100,1284,1286],{"className":1285},[251,252,253,254],[100,1287,176],{"className":1288},[198,254],[100,1290,1291,1294],{"style":382},[100,1292],{"className":1293,"style":247},[246],[100,1295,1297],{"className":1296},[251,252,253,254],[100,1298,176],{"className":1299},[198,254],[100,1301,262],{"className":1302},[261],[100,1304,1306],{"className":1305},[234],[100,1307,1310],{"className":1308,"style":1309},[238],"height:0.2481em;",[100,1311],{},[11,1313,1314],{},"The expected distortion (error) is:",[11,1316,1317],{},[100,1318,1320,1374],{"className":1319},[103],[100,1321,1323],{"className":1322},[107],[109,1324,1325],{"xmlns":111},[113,1326,1327,1371],{},[116,1328,1329,1332,1334,1341],{},[119,1330,1331],{},"D",[123,1333,148],{},[135,1335,1336,1339],{},[119,1337,1338],{"mathvariant":663},"E",[119,1340,129],{"mathvariant":655},[116,1342,1343,1346,1348,1350,1352,1354,1356,1358,1360,1368],{},[123,1344,1345],{"fence":132},"[",[119,1347,1027],{"mathvariant":151},[119,1349,129],{"mathvariant":655},[123,1351,159],{},[119,1353,988],{},[123,1355,126],{"stretchy":125},[119,1357,129],{"mathvariant":655},[123,1359,145],{"stretchy":125},[1039,1361,1362,1364,1366],{},[119,1363,1027],{"mathvariant":151},[174,1365,176],{},[174,1367,176],{},[123,1369,1370],{"fence":132},"]",[178,1372,1373],{"encoding":180},"D = \\mathbb{E}_{\\mathbf{x}}\\left[\\|\\mathbf{x} - Q(\\mathbf{x})\\|_2^2\\right]",[100,1375,1377,1396],{"className":1376,"ariaHidden":132},[185],[100,1378,1380,1383,1387,1390,1393],{"className":1379},[189],[100,1381],{"className":1382,"style":796},[193],[100,1384,1331],{"className":1385,"style":1386},[198,199],"margin-right:0.0278em;",[100,1388],{"className":1389,"style":279},[214],[100,1391,148],{"className":1392},[283],[100,1394],{"className":1395,"style":279},[214],[100,1397,1399,1403,1447,1450],{"className":1398},[189],[100,1400],{"className":1401,"style":1402},[193],"height:1.2em;vertical-align:-0.35em;",[100,1404,1406,1409],{"className":1405},[198],[100,1407,1338],{"className":1408},[198,706],[100,1410,1412],{"className":1411},[225],[100,1413,1415,1439],{"className":1414},[229,230],[100,1416,1418,1436],{"className":1417},[234],[100,1419,1422],{"className":1420,"style":1421},[238],"height:0.1611em;",[100,1423,1424,1427],{"style":242},[100,1425],{"className":1426,"style":247},[246],[100,1428,1430],{"className":1429},[251,252,253,254],[100,1431,1433],{"className":1432},[198,254],[100,1434,129],{"className":1435},[198,683,254],[100,1437,262],{"className":1438},[261],[100,1440,1442],{"className":1441},[234],[100,1443,1445],{"className":1444,"style":269},[238],[100,1446],{},[100,1448],{"className":1449,"style":215},[214],[100,1451,1453,1462,1465,1468,1471,1474,1477,1480,1483,1486,1489,1540],{"className":1452},[916],[100,1454,1458],{"className":1455,"style":1457},[203,1456],"delimcenter","top:0em;",[100,1459,1345],{"className":1460},[1461,1159],"delimsizing",[100,1463,1027],{"className":1464},[198],[100,1466,129],{"className":1467},[198,683],[100,1469],{"className":1470,"style":303},[214],[100,1472,159],{"className":1473},[307],[100,1475],{"className":1476,"style":303},[214],[100,1478,988],{"className":1479},[198,199],[100,1481,126],{"className":1482},[203],[100,1484,129],{"className":1485},[198,683],[100,1487,145],{"className":1488},[275],[100,1490,1492,1495],{"className":1491},[198],[100,1493,1027],{"className":1494},[198],[100,1496,1498],{"className":1497},[225],[100,1499,1501,1532],{"className":1500},[229,230],[100,1502,1504,1529],{"className":1503},[234],[100,1505,1507,1518],{"className":1506,"style":379},[238],[100,1508,1509,1512],{"style":1279},[100,1510],{"className":1511,"style":247},[246],[100,1513,1515],{"className":1514},[251,252,253,254],[100,1516,176],{"className":1517},[198,254],[100,1519,1520,1523],{"style":382},[100,1521],{"className":1522,"style":247},[246],[100,1524,1526],{"className":1525},[251,252,253,254],[100,1527,176],{"className":1528},[198,254],[100,1530,262],{"className":1531},[261],[100,1533,1535],{"className":1534},[234],[100,1536,1538],{"className":1537,"style":1309},[238],[100,1539],{},[100,1541,1543],{"className":1542,"style":1457},[275,1456],[100,1544,1370],{"className":1545},[1461,1159],[1547,1548,1550],"h3",{"id":1549},"core-limitations","Core Limitations",[1552,1553,1554,1606,1748,1805,1903],"ol",{},[1555,1556,1557,1560,1561,1605],"li",{},[75,1558,1559],{},"Limited expressiveness"," — A single codeword per region can't capture complex or multimodal distributions in ",[100,1562,1564,1584],{"className":1563},[103],[100,1565,1567],{"className":1566},[107],[109,1568,1569],{"xmlns":111},[113,1570,1571,1581],{},[116,1572,1573,1575,1577,1579],{},[119,1574,11],{},[123,1576,126],{"stretchy":125},[119,1578,129],{"mathvariant":655},[123,1580,145],{"stretchy":125},[178,1582,1583],{"encoding":180},"p(\\mathbf{x})",[100,1585,1587],{"className":1586,"ariaHidden":132},[185],[100,1588,1590,1593,1596,1599,1602],{"className":1589},[189],[100,1591],{"className":1592,"style":194},[193],[100,1594,11],{"className":1595},[198,199],[100,1597,126],{"className":1598},[203],[100,1600,129],{"className":1601},[198,683],[100,1603,145],{"className":1604},[275],".",[1555,1607,1608,1611,1612,1615,1616,1718,1719,1747],{},[75,1609,1610],{},"Codebook growth problem"," — To halve distortion, you often need to ",[14,1613,1614],{},"square"," the number of codewords: ",[100,1617,1619,1651],{"className":1618},[103],[100,1620,1622],{"className":1621},[107],[109,1623,1624],{"xmlns":111},[113,1625,1626,1648],{},[116,1627,1628,1630,1633],{},[119,1629,1331],{},[123,1631,1632],{},"∝",[169,1634,1635,1637],{},[119,1636,780],{},[116,1638,1639,1641,1643,1646],{},[123,1640,159],{},[174,1642,176],{},[119,1644,1645],{"mathvariant":151},"\u002F",[119,1647,121],{},[178,1649,1650],{"encoding":180},"D \\propto K^{-2\u002Fd}",[100,1652,1654,1672],{"className":1653,"ariaHidden":132},[185],[100,1655,1657,1660,1663,1666,1669],{"className":1656},[189],[100,1658],{"className":1659,"style":796},[193],[100,1661,1331],{"className":1662,"style":1386},[198,199],[100,1664],{"className":1665,"style":279},[214],[100,1667,1632],{"className":1668},[283],[100,1670],{"className":1671,"style":279},[214],[100,1673,1675,1679],{"className":1674},[189],[100,1676],{"className":1677,"style":1678},[193],"height:0.888em;",[100,1680,1682,1685],{"className":1681},[198],[100,1683,780],{"className":1684,"style":956},[198,199],[100,1686,1688],{"className":1687},[225],[100,1689,1691],{"className":1690},[229],[100,1692,1694],{"className":1693},[234],[100,1695,1697],{"className":1696,"style":1678},[238],[100,1698,1699,1702],{"style":382},[100,1700],{"className":1701,"style":247},[246],[100,1703,1705],{"className":1704},[251,252,253,254],[100,1706,1708,1711,1715],{"className":1707},[198,254],[100,1709,159],{"className":1710},[198,254],[100,1712,1714],{"className":1713},[198,254],"2\u002F",[100,1716,121],{"className":1717},[198,199,254],". Larger ",[100,1720,1722,1735],{"className":1721},[103],[100,1723,1725],{"className":1724},[107],[109,1726,1727],{"xmlns":111},[113,1728,1729,1733],{},[116,1730,1731],{},[119,1732,780],{},[178,1734,780],{"encoding":180},[100,1736,1738],{"className":1737,"ariaHidden":132},[185],[100,1739,1741,1744],{"className":1740},[189],[100,1742],{"className":1743,"style":796},[193],[100,1745,780],{"className":1746,"style":956},[198,199]," implies exponential memory and compute.",[1555,1749,1750,1753,1754,1804],{},[75,1751,1752],{},"High encoding cost"," — Nearest-neighbor search costs ",[100,1755,1757,1780],{"className":1756},[103],[100,1758,1760],{"className":1759},[107],[109,1761,1762],{"xmlns":111},[113,1763,1764,1777],{},[116,1765,1766,1769,1771,1773,1775],{},[119,1767,1768],{},"O",[123,1770,126],{"stretchy":125},[119,1772,780],{},[119,1774,121],{},[123,1776,145],{"stretchy":125},[178,1778,1779],{"encoding":180},"O(Kd)",[100,1781,1783],{"className":1782,"ariaHidden":132},[185],[100,1784,1786,1789,1792,1795,1798,1801],{"className":1785},[189],[100,1787],{"className":1788,"style":194},[193],[100,1790,1768],{"className":1791,"style":1386},[198,199],[100,1793,126],{"className":1794},[203],[100,1796,780],{"className":1797,"style":956},[198,199],[100,1799,121],{"className":1800},[198,199],[100,1802,145],{"className":1803},[275]," for each vector.",[1555,1806,1807,1810,1811,1902],{},[75,1808,1809],{},"No residual correction"," — Once quantized, the residual ",[100,1812,1814,1843],{"className":1813},[103],[100,1815,1817],{"className":1816},[107],[109,1818,1819],{"xmlns":111},[113,1820,1821,1840],{},[116,1822,1823,1826,1828,1830,1832,1834,1836,1838],{},[119,1824,1825],{"mathvariant":655},"e",[123,1827,148],{},[119,1829,129],{"mathvariant":655},[123,1831,159],{},[119,1833,988],{},[123,1835,126],{"stretchy":125},[119,1837,129],{"mathvariant":655},[123,1839,145],{"stretchy":125},[178,1841,1842],{"encoding":180},"\\mathbf{e} = \\mathbf{x} - Q(\\mathbf{x})",[100,1844,1846,1865,1884],{"className":1845,"ariaHidden":132},[185],[100,1847,1849,1853,1856,1859,1862],{"className":1848},[189],[100,1850],{"className":1851,"style":1852},[193],"height:0.4444em;",[100,1854,1825],{"className":1855},[198,683],[100,1857],{"className":1858,"style":279},[214],[100,1860,148],{"className":1861},[283],[100,1863],{"className":1864,"style":279},[214],[100,1866,1868,1872,1875,1878,1881],{"className":1867},[189],[100,1869],{"className":1870,"style":1871},[193],"height:0.6667em;vertical-align:-0.0833em;",[100,1873,129],{"className":1874},[198,683],[100,1876],{"className":1877,"style":303},[214],[100,1879,159],{"className":1880},[307],[100,1882],{"className":1883,"style":303},[214],[100,1885,1887,1890,1893,1896,1899],{"className":1886},[189],[100,1888],{"className":1889,"style":194},[193],[100,1891,988],{"className":1892},[198,199],[100,1894,126],{"className":1895},[203],[100,1897,129],{"className":1898},[198,683],[100,1900,145],{"className":1901},[275]," is discarded, wasting useful fine-grained detail.",[1555,1904,1905,1908],{},[75,1906,1907],{},"Uniform distortion metric"," — Standard L2 distance treats all dimensions equally.",[18,1910],{},[70,1912,1913],{},[11,1914,1915],{},"Classic VQ minimizes distortion but scales poorly with dimension and distribution complexity — this motivates techniques like Residual Vector Quantization (RVQ) to address these limitations.",[21,1917,1919],{"id":1918},"residual-vector-quantization","Residual Vector Quantization",[11,1921,1922],{},[34,1923],{"alt":1924,"src":1925},"Residual Vector Quantization (RVQ) Architecture","https:\u002F\u002Fnotesbylex.com\u002F_media\u002Frvq.png",[11,1927,1928],{},[14,1929,1930],{},"Figure 3: Residual Vector Quantization (RVQ) Architecture",[11,1932,1933],{},"Residual Vector Quantization fundamentally changed the game by stacking multiple quantizers, where each stage learns to encode the error left behind by the previous one.",[11,1935,1936],{},"The mathematical formulation of RVQ is:",[11,1938,1939],{},[100,1940,1942,2048],{"className":1941},[103],[100,1943,1945],{"className":1944},[107],[109,1946,1947],{"xmlns":111},[113,1948,1949,2045],{},[116,1950,1951,1953,1956,1963,1965,1967,1969,1972,1978,1980,1982,1984,1990,1992,1994,1996,1998,2000,2007,2009,2011,2013,2019,2021,2023,2025,2027,2033,2035,2037,2039,2041,2043],{},[119,1952,129],{"mathvariant":655},[123,1954,1955],{},"≈",[135,1957,1958,1961],{},[119,1959,1960],{},"q",[174,1962,758],{},[123,1964,126],{"stretchy":125},[119,1966,129],{"mathvariant":655},[123,1968,145],{"stretchy":125},[123,1970,1971],{},"+",[135,1973,1974,1976],{},[119,1975,1960],{},[174,1977,176],{},[123,1979,126],{"stretchy":125},[119,1981,129],{"mathvariant":655},[123,1983,159],{},[135,1985,1986,1988],{},[119,1987,1960],{},[174,1989,758],{},[123,1991,126],{"stretchy":125},[119,1993,129],{"mathvariant":655},[123,1995,145],{"stretchy":125},[123,1997,145],{"stretchy":125},[123,1999,1971],{},[135,2001,2002,2004],{},[119,2003,1960],{},[174,2005,2006],{},"3",[123,2008,126],{"stretchy":125},[119,2010,129],{"mathvariant":655},[123,2012,159],{},[135,2014,2015,2017],{},[119,2016,1960],{},[174,2018,758],{},[123,2020,126],{"stretchy":125},[119,2022,129],{"mathvariant":655},[123,2024,145],{"stretchy":125},[123,2026,159],{},[135,2028,2029,2031],{},[119,2030,1960],{},[174,2032,176],{},[123,2034,126],{"stretchy":125},[119,2036,129],{"mathvariant":655},[123,2038,145],{"stretchy":125},[123,2040,145],{"stretchy":125},[123,2042,1971],{},[123,2044,771],{},[178,2046,2047],{"encoding":180},"\\mathbf{x} \\approx q_1(\\mathbf{x}) + q_2(\\mathbf{x} - q_1(\\mathbf{x})) + q_3(\\mathbf{x} - q_1(\\mathbf{x}) - q_2(\\mathbf{x})) + \\ldots",[100,2049,2051,2070,2136,2197,2262,2323,2387,2451],{"className":2050,"ariaHidden":132},[185],[100,2052,2054,2058,2061,2064,2067],{"className":2053},[189],[100,2055],{"className":2056,"style":2057},[193],"height:0.4831em;",[100,2059,129],{"className":2060},[198,683],[100,2062],{"className":2063,"style":279},[214],[100,2065,1955],{"className":2066},[283],[100,2068],{"className":2069,"style":279},[214],[100,2071,2073,2076,2118,2121,2124,2127,2130,2133],{"className":2072},[189],[100,2074],{"className":2075,"style":194},[193],[100,2077,2079,2083],{"className":2078},[198],[100,2080,1960],{"className":2081,"style":2082},[198,199],"margin-right:0.0359em;",[100,2084,2086],{"className":2085},[225],[100,2087,2089,2110],{"className":2088},[229,230],[100,2090,2092,2107],{"className":2091},[234],[100,2093,2095],{"className":2094,"style":838},[238],[100,2096,2098,2101],{"style":2097},"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;",[100,2099],{"className":2100,"style":247},[246],[100,2102,2104],{"className":2103},[251,252,253,254],[100,2105,758],{"className":2106},[198,254],[100,2108,262],{"className":2109},[261],[100,2111,2113],{"className":2112},[234],[100,2114,2116],{"className":2115,"style":269},[238],[100,2117],{},[100,2119,126],{"className":2120},[203],[100,2122,129],{"className":2123},[198,683],[100,2125,145],{"className":2126},[275],[100,2128],{"className":2129,"style":303},[214],[100,2131,1971],{"className":2132},[307],[100,2134],{"className":2135,"style":303},[214],[100,2137,2139,2142,2182,2185,2188,2191,2194],{"className":2138},[189],[100,2140],{"className":2141,"style":194},[193],[100,2143,2145,2148],{"className":2144},[198],[100,2146,1960],{"className":2147,"style":2082},[198,199],[100,2149,2151],{"className":2150},[225],[100,2152,2154,2174],{"className":2153},[229,230],[100,2155,2157,2171],{"className":2156},[234],[100,2158,2160],{"className":2159,"style":838},[238],[100,2161,2162,2165],{"style":2097},[100,2163],{"className":2164,"style":247},[246],[100,2166,2168],{"className":2167},[251,252,253,254],[100,2169,176],{"className":2170},[198,254],[100,2172,262],{"className":2173},[261],[100,2175,2177],{"className":2176},[234],[100,2178,2180],{"className":2179,"style":269},[238],[100,2181],{},[100,2183,126],{"className":2184},[203],[100,2186,129],{"className":2187},[198,683],[100,2189],{"className":2190,"style":303},[214],[100,2192,159],{"className":2193},[307],[100,2195],{"className":2196,"style":303},[214],[100,2198,2200,2203,2243,2246,2249,2253,2256,2259],{"className":2199},[189],[100,2201],{"className":2202,"style":194},[193],[100,2204,2206,2209],{"className":2205},[198],[100,2207,1960],{"className":2208,"style":2082},[198,199],[100,2210,2212],{"className":2211},[225],[100,2213,2215,2235],{"className":2214},[229,230],[100,2216,2218,2232],{"className":2217},[234],[100,2219,2221],{"className":2220,"style":838},[238],[100,2222,2223,2226],{"style":2097},[100,2224],{"className":2225,"style":247},[246],[100,2227,2229],{"className":2228},[251,252,253,254],[100,2230,758],{"className":2231},[198,254],[100,2233,262],{"className":2234},[261],[100,2236,2238],{"className":2237},[234],[100,2239,2241],{"className":2240,"style":269},[238],[100,2242],{},[100,2244,126],{"className":2245},[203],[100,2247,129],{"className":2248},[198,683],[100,2250,2252],{"className":2251},[275],"))",[100,2254],{"className":2255,"style":303},[214],[100,2257,1971],{"className":2258},[307],[100,2260],{"className":2261,"style":303},[214],[100,2263,2265,2268,2308,2311,2314,2317,2320],{"className":2264},[189],[100,2266],{"className":2267,"style":194},[193],[100,2269,2271,2274],{"className":2270},[198],[100,2272,1960],{"className":2273,"style":2082},[198,199],[100,2275,2277],{"className":2276},[225],[100,2278,2280,2300],{"className":2279},[229,230],[100,2281,2283,2297],{"className":2282},[234],[100,2284,2286],{"className":2285,"style":838},[238],[100,2287,2288,2291],{"style":2097},[100,2289],{"className":2290,"style":247},[246],[100,2292,2294],{"className":2293},[251,252,253,254],[100,2295,2006],{"className":2296},[198,254],[100,2298,262],{"className":2299},[261],[100,2301,2303],{"className":2302},[234],[100,2304,2306],{"className":2305,"style":269},[238],[100,2307],{},[100,2309,126],{"className":2310},[203],[100,2312,129],{"className":2313},[198,683],[100,2315],{"className":2316,"style":303},[214],[100,2318,159],{"className":2319},[307],[100,2321],{"className":2322,"style":303},[214],[100,2324,2326,2329,2369,2372,2375,2378,2381,2384],{"className":2325},[189],[100,2327],{"className":2328,"style":194},[193],[100,2330,2332,2335],{"className":2331},[198],[100,2333,1960],{"className":2334,"style":2082},[198,199],[100,2336,2338],{"className":2337},[225],[100,2339,2341,2361],{"className":2340},[229,230],[100,2342,2344,2358],{"className":2343},[234],[100,2345,2347],{"className":2346,"style":838},[238],[100,2348,2349,2352],{"style":2097},[100,2350],{"className":2351,"style":247},[246],[100,2353,2355],{"className":2354},[251,252,253,254],[100,2356,758],{"className":2357},[198,254],[100,2359,262],{"className":2360},[261],[100,2362,2364],{"className":2363},[234],[100,2365,2367],{"className":2366,"style":269},[238],[100,2368],{},[100,2370,126],{"className":2371},[203],[100,2373,129],{"className":2374},[198,683],[100,2376,145],{"className":2377},[275],[100,2379],{"className":2380,"style":303},[214],[100,2382,159],{"className":2383},[307],[100,2385],{"className":2386,"style":303},[214],[100,2388,2390,2393,2433,2436,2439,2442,2445,2448],{"className":2389},[189],[100,2391],{"className":2392,"style":194},[193],[100,2394,2396,2399],{"className":2395},[198],[100,2397,1960],{"className":2398,"style":2082},[198,199],[100,2400,2402],{"className":2401},[225],[100,2403,2405,2425],{"className":2404},[229,230],[100,2406,2408,2422],{"className":2407},[234],[100,2409,2411],{"className":2410,"style":838},[238],[100,2412,2413,2416],{"style":2097},[100,2414],{"className":2415,"style":247},[246],[100,2417,2419],{"className":2418},[251,252,253,254],[100,2420,176],{"className":2421},[198,254],[100,2423,262],{"className":2424},[261],[100,2426,2428],{"className":2427},[234],[100,2429,2431],{"className":2430,"style":269},[238],[100,2432],{},[100,2434,126],{"className":2435},[203],[100,2437,129],{"className":2438},[198,683],[100,2440,2252],{"className":2441},[275],[100,2443],{"className":2444,"style":303},[214],[100,2446,1971],{"className":2447},[307],[100,2449],{"className":2450,"style":303},[214],[100,2452,2454,2458],{"className":2453},[189],[100,2455],{"className":2456,"style":2457},[193],"height:0.123em;",[100,2459,771],{"className":2460},[916],[11,2462,2463],{},"where:",[2465,2466,2467,2550,2668],"ul",{},[1555,2468,2469,2549],{},[100,2470,2472,2493],{"className":2471},[103],[100,2473,2475],{"className":2474},[107],[109,2476,2477],{"xmlns":111},[113,2478,2479,2491],{},[116,2480,2481,2483,2485],{},[119,2482,129],{"mathvariant":655},[123,2484,658],{},[169,2486,2487,2489],{},[119,2488,664],{"mathvariant":663},[119,2490,121],{},[178,2492,669],{"encoding":180},[100,2494,2496,2514],{"className":2495,"ariaHidden":132},[185],[100,2497,2499,2502,2505,2508,2511],{"className":2498},[189],[100,2500],{"className":2501,"style":679},[193],[100,2503,129],{"className":2504},[198,683],[100,2506],{"className":2507,"style":279},[214],[100,2509,658],{"className":2510},[283],[100,2512],{"className":2513,"style":279},[214],[100,2515,2517,2520],{"className":2516},[189],[100,2518],{"className":2519,"style":699},[193],[100,2521,2523,2526],{"className":2522},[198],[100,2524,664],{"className":2525},[198,706],[100,2527,2529],{"className":2528},[225],[100,2530,2532],{"className":2531},[229],[100,2533,2535],{"className":2534},[234],[100,2536,2538],{"className":2537,"style":699},[238],[100,2539,2540,2543],{"style":382},[100,2541],{"className":2542,"style":247},[246],[100,2544,2546],{"className":2545},[251,252,253,254],[100,2547,121],{"className":2548},[198,199,254]," is the input vector",[1555,2551,2552,2638,2639],{},[100,2553,2555,2580],{"className":2554},[103],[100,2556,2558],{"className":2557},[107],[109,2559,2560],{"xmlns":111},[113,2561,2562,2577],{},[116,2563,2564,2570,2572,2575],{},[135,2565,2566,2568],{},[119,2567,1960],{},[119,2569,142],{},[123,2571,126],{"stretchy":125},[123,2573,2574],{},"⋅",[123,2576,145],{"stretchy":125},[178,2578,2579],{"encoding":180},"q_i(\\cdot)",[100,2581,2583],{"className":2582,"ariaHidden":132},[185],[100,2584,2586,2589,2629,2632,2635],{"className":2585},[189],[100,2587],{"className":2588,"style":194},[193],[100,2590,2592,2595],{"className":2591},[198],[100,2593,1960],{"className":2594,"style":2082},[198,199],[100,2596,2598],{"className":2597},[225],[100,2599,2601,2621],{"className":2600},[229,230],[100,2602,2604,2618],{"className":2603},[234],[100,2605,2607],{"className":2606,"style":239},[238],[100,2608,2609,2612],{"style":2097},[100,2610],{"className":2611,"style":247},[246],[100,2613,2615],{"className":2614},[251,252,253,254],[100,2616,142],{"className":2617},[198,199,254],[100,2619,262],{"className":2620},[261],[100,2622,2624],{"className":2623},[234],[100,2625,2627],{"className":2626,"style":269},[238],[100,2628],{},[100,2630,126],{"className":2631},[203],[100,2633,2574],{"className":2634},[198],[100,2636,145],{"className":2637},[275]," is the quantizer at stage ",[100,2640,2642,2655],{"className":2641},[103],[100,2643,2645],{"className":2644},[107],[109,2646,2647],{"xmlns":111},[113,2648,2649,2653],{},[116,2650,2651],{},[119,2652,142],{},[178,2654,142],{"encoding":180},[100,2656,2658],{"className":2657,"ariaHidden":132},[185],[100,2659,2661,2665],{"className":2660},[189],[100,2662],{"className":2663,"style":2664},[193],"height:0.6595em;",[100,2666,142],{"className":2667},[198,199],[1555,2669,2670,2987,2988],{},[100,2671,2673,2736],{"className":2672},[103],[100,2674,2676],{"className":2675},[107],[109,2677,2678],{"xmlns":111},[113,2679,2680,2733],{},[116,2681,2682,2689,2691,2693,2695,2711,2717,2719,2731],{},[135,2683,2684,2687],{},[119,2685,2686],{"mathvariant":655},"r",[119,2688,142],{},[123,2690,148],{},[119,2692,129],{"mathvariant":655},[123,2694,159],{},[1039,2696,2697,2700,2709],{},[123,2698,2699],{},"∑",[116,2701,2702,2705,2707],{},[119,2703,2704],{},"j",[123,2706,148],{},[174,2708,758],{},[119,2710,142],{},[135,2712,2713,2715],{},[119,2714,1960],{},[119,2716,2704],{},[123,2718,126],{"stretchy":125},[135,2720,2721,2723],{},[119,2722,2686],{"mathvariant":655},[116,2724,2725,2727,2729],{},[119,2726,2704],{},[123,2728,159],{},[174,2730,758],{},[123,2732,145],{"stretchy":125},[178,2734,2735],{"encoding":180},"\\mathbf{r}_i = \\mathbf{x} - \\sum_{j=1}^i q_j(\\mathbf{r}_{j-1})",[100,2737,2739,2795,2813],{"className":2738,"ariaHidden":132},[185],[100,2740,2742,2746,2786,2789,2792],{"className":2741},[189],[100,2743],{"className":2744,"style":2745},[193],"height:0.5944em;vertical-align:-0.15em;",[100,2747,2749,2752],{"className":2748},[198],[100,2750,2686],{"className":2751},[198,683],[100,2753,2755],{"className":2754},[225],[100,2756,2758,2778],{"className":2757},[229,230],[100,2759,2761,2775],{"className":2760},[234],[100,2762,2764],{"className":2763,"style":239},[238],[100,2765,2766,2769],{"style":242},[100,2767],{"className":2768,"style":247},[246],[100,2770,2772],{"className":2771},[251,252,253,254],[100,2773,142],{"className":2774},[198,199,254],[100,2776,262],{"className":2777},[261],[100,2779,2781],{"className":2780},[234],[100,2782,2784],{"className":2783,"style":269},[238],[100,2785],{},[100,2787],{"className":2788,"style":279},[214],[100,2790,148],{"className":2791},[283],[100,2793],{"className":2794,"style":279},[214],[100,2796,2798,2801,2804,2807,2810],{"className":2797},[189],[100,2799],{"className":2800,"style":1871},[193],[100,2802,129],{"className":2803},[198,683],[100,2805],{"className":2806,"style":303},[214],[100,2808,159],{"className":2809},[307],[100,2811],{"className":2812,"style":303},[214],[100,2814,2816,2820,2888,2891,2932,2935,2984],{"className":2815},[189],[100,2817],{"className":2818,"style":2819},[193],"height:1.4004em;vertical-align:-0.4358em;",[100,2821,2823,2829],{"className":2822},[1090],[100,2824,2699],{"className":2825,"style":2828},[1090,2826,2827],"op-symbol","small-op","position:relative;top:0em;",[100,2830,2832],{"className":2831},[225],[100,2833,2835,2879],{"className":2834},[229,230],[100,2836,2838,2876],{"className":2837},[234],[100,2839,2842,2864],{"className":2840,"style":2841},[238],"height:0.9646em;",[100,2843,2845,2848],{"style":2844},"top:-2.4003em;margin-left:0em;margin-right:0.05em;",[100,2846],{"className":2847,"style":247},[246],[100,2849,2851],{"className":2850},[251,252,253,254],[100,2852,2854,2858,2861],{"className":2853},[198,254],[100,2855,2704],{"className":2856,"style":2857},[198,199,254],"margin-right:0.0572em;",[100,2859,148],{"className":2860},[283,254],[100,2862,758],{"className":2863},[198,254],[100,2865,2867,2870],{"style":2866},"top:-3.2029em;margin-right:0.05em;",[100,2868],{"className":2869,"style":247},[246],[100,2871,2873],{"className":2872},[251,252,253,254],[100,2874,142],{"className":2875},[198,199,254],[100,2877,262],{"className":2878},[261],[100,2880,2882],{"className":2881},[234],[100,2883,2886],{"className":2884,"style":2885},[238],"height:0.4358em;",[100,2887],{},[100,2889],{"className":2890,"style":215},[214],[100,2892,2894,2897],{"className":2893},[198],[100,2895,1960],{"className":2896,"style":2082},[198,199],[100,2898,2900],{"className":2899},[225],[100,2901,2903,2923],{"className":2902},[229,230],[100,2904,2906,2920],{"className":2905},[234],[100,2907,2909],{"className":2908,"style":239},[238],[100,2910,2911,2914],{"style":2097},[100,2912],{"className":2913,"style":247},[246],[100,2915,2917],{"className":2916},[251,252,253,254],[100,2918,2704],{"className":2919,"style":2857},[198,199,254],[100,2921,262],{"className":2922},[261],[100,2924,2926],{"className":2925},[234],[100,2927,2930],{"className":2928,"style":2929},[238],"height:0.2861em;",[100,2931],{},[100,2933,126],{"className":2934},[203],[100,2936,2938,2941],{"className":2937},[198],[100,2939,2686],{"className":2940},[198,683],[100,2942,2944],{"className":2943},[225],[100,2945,2947,2976],{"className":2946},[229,230],[100,2948,2950,2973],{"className":2949},[234],[100,2951,2953],{"className":2952,"style":239},[238],[100,2954,2955,2958],{"style":242},[100,2956],{"className":2957,"style":247},[246],[100,2959,2961],{"className":2960},[251,252,253,254],[100,2962,2964,2967,2970],{"className":2963},[198,254],[100,2965,2704],{"className":2966,"style":2857},[198,199,254],[100,2968,159],{"className":2969},[307,254],[100,2971,758],{"className":2972},[198,254],[100,2974,262],{"className":2975},[261],[100,2977,2979],{"className":2978},[234],[100,2980,2982],{"className":2981,"style":2929},[238],[100,2983],{},[100,2985,145],{"className":2986},[275]," is the residual at stage ",[100,2989,2991,3004],{"className":2990},[103],[100,2992,2994],{"className":2993},[107],[109,2995,2996],{"xmlns":111},[113,2997,2998,3002],{},[116,2999,3000],{},[119,3001,142],{},[178,3003,142],{"encoding":180},[100,3005,3007],{"className":3006,"ariaHidden":132},[185],[100,3008,3010,3013],{"className":3009},[189],[100,3011],{"className":3012,"style":2664},[193],[100,3014,142],{"className":3015},[198,199],[11,3017,3018],{},"The final reconstruction is:",[11,3020,3021],{},[100,3022,3024,3083],{"className":3023},[103],[100,3025,3027],{"className":3026},[107],[109,3028,3029],{"xmlns":111},[113,3030,3031,3080],{},[116,3032,3033,3041,3043,3058,3064,3066,3078],{},[3034,3035,3036,3038],"mover",{"accent":132},[119,3037,129],{"mathvariant":655},[123,3039,3040],{},"^",[123,3042,148],{},[1039,3044,3045,3047,3055],{},[123,3046,2699],{},[116,3048,3049,3051,3053],{},[119,3050,142],{},[123,3052,148],{},[174,3054,758],{},[119,3056,3057],{},"N",[135,3059,3060,3062],{},[119,3061,1960],{},[119,3063,142],{},[123,3065,126],{"stretchy":125},[135,3067,3068,3070],{},[119,3069,2686],{"mathvariant":655},[116,3071,3072,3074,3076],{},[119,3073,142],{},[123,3075,159],{},[174,3077,758],{},[123,3079,145],{"stretchy":125},[178,3081,3082],{"encoding":180},"\\hat{\\mathbf{x}} = \\sum_{i=1}^N q_i(\\mathbf{r}_{i-1})",[100,3084,3086,3139],{"className":3085,"ariaHidden":132},[185],[100,3087,3089,3093,3130,3133,3136],{"className":3088},[189],[100,3090],{"className":3091,"style":3092},[193],"height:0.7079em;",[100,3094,3097],{"className":3095},[198,3096],"accent",[100,3098,3100],{"className":3099},[229],[100,3101,3103],{"className":3102},[234],[100,3104,3106,3116],{"className":3105,"style":3092},[238],[100,3107,3109,3113],{"style":3108},"top:-3em;",[100,3110],{"className":3111,"style":3112},[246],"height:3em;",[100,3114,129],{"className":3115},[198,683],[100,3117,3119,3122],{"style":3118},"top:-3.0134em;",[100,3120],{"className":3121,"style":3112},[246],[100,3123,3127],{"className":3124,"style":3126},[3125],"accent-body","left:-0.25em;",[100,3128,3040],{"className":3129},[198],[100,3131],{"className":3132,"style":279},[214],[100,3134,148],{"className":3135},[283],[100,3137],{"className":3138,"style":279},[214],[100,3140,3142,3146,3209,3212,3252,3255,3305],{"className":3141},[189],[100,3143],{"className":3144,"style":3145},[193],"height:1.2809em;vertical-align:-0.2997em;",[100,3147,3149,3152],{"className":3148},[1090],[100,3150,2699],{"className":3151,"style":2828},[1090,2826,2827],[100,3153,3155],{"className":3154},[225],[100,3156,3158,3200],{"className":3157},[229,230],[100,3159,3161,3197],{"className":3160},[234],[100,3162,3165,3185],{"className":3163,"style":3164},[238],"height:0.9812em;",[100,3166,3167,3170],{"style":2844},[100,3168],{"className":3169,"style":247},[246],[100,3171,3173],{"className":3172},[251,252,253,254],[100,3174,3176,3179,3182],{"className":3175},[198,254],[100,3177,142],{"className":3178},[198,199,254],[100,3180,148],{"className":3181},[283,254],[100,3183,758],{"className":3184},[198,254],[100,3186,3187,3190],{"style":2866},[100,3188],{"className":3189,"style":247},[246],[100,3191,3193],{"className":3192},[251,252,253,254],[100,3194,3057],{"className":3195,"style":3196},[198,199,254],"margin-right:0.109em;",[100,3198,262],{"className":3199},[261],[100,3201,3203],{"className":3202},[234],[100,3204,3207],{"className":3205,"style":3206},[238],"height:0.2997em;",[100,3208],{},[100,3210],{"className":3211,"style":215},[214],[100,3213,3215,3218],{"className":3214},[198],[100,3216,1960],{"className":3217,"style":2082},[198,199],[100,3219,3221],{"className":3220},[225],[100,3222,3224,3244],{"className":3223},[229,230],[100,3225,3227,3241],{"className":3226},[234],[100,3228,3230],{"className":3229,"style":239},[238],[100,3231,3232,3235],{"style":2097},[100,3233],{"className":3234,"style":247},[246],[100,3236,3238],{"className":3237},[251,252,253,254],[100,3239,142],{"className":3240},[198,199,254],[100,3242,262],{"className":3243},[261],[100,3245,3247],{"className":3246},[234],[100,3248,3250],{"className":3249,"style":269},[238],[100,3251],{},[100,3253,126],{"className":3254},[203],[100,3256,3258,3261],{"className":3257},[198],[100,3259,2686],{"className":3260},[198,683],[100,3262,3264],{"className":3263},[225],[100,3265,3267,3296],{"className":3266},[229,230],[100,3268,3270,3293],{"className":3269},[234],[100,3271,3273],{"className":3272,"style":239},[238],[100,3274,3275,3278],{"style":242},[100,3276],{"className":3277,"style":247},[246],[100,3279,3281],{"className":3280},[251,252,253,254],[100,3282,3284,3287,3290],{"className":3283},[198,254],[100,3285,142],{"className":3286},[198,199,254],[100,3288,159],{"className":3289},[307,254],[100,3291,758],{"className":3292},[198,254],[100,3294,262],{"className":3295},[261],[100,3297,3299],{"className":3298},[234],[100,3300,3303],{"className":3301,"style":3302},[238],"height:0.2083em;",[100,3304],{},[100,3306,145],{"className":3307},[275],[11,3309,3310],{},"RVQ builds the final approximation by adding up several small corrections instead of using one big codebook.",[396,3312,3314],{"className":398,"code":3313,"language":400,"meta":401,"style":401},"def residual_quantize(input_vec, codebooks):\n    \"\"\"Multi-stage quantization with progressive refinement\"\"\"\n    reconstruction = np.zeros_like(input_vec)\n    residual = input_vec.copy()\n    indices = []\n\n    for stage, codebook in enumerate(codebooks):\n        quant_vec, idx = quantize_vector(residual, codebook)\n        reconstruction += quant_vec\n        indices.append(idx)\n        residual = input_vec - reconstruction\n        print(f\"Stage {stage+1} residual norm: {np.linalg.norm(residual):.4f}\")\n\n    return reconstruction, indices\n",[403,3315,3316,3321,3326,3331,3336,3341,3345,3350,3355,3360,3365,3370,3375,3379],{"__ignoreMap":401},[100,3317,3318],{"class":407,"line":408},[100,3319,3320],{},"def residual_quantize(input_vec, codebooks):\n",[100,3322,3323],{"class":407,"line":414},[100,3324,3325],{},"    \"\"\"Multi-stage quantization with progressive refinement\"\"\"\n",[100,3327,3328],{"class":407,"line":421},[100,3329,3330],{},"    reconstruction = np.zeros_like(input_vec)\n",[100,3332,3333],{"class":407,"line":427},[100,3334,3335],{},"    residual = input_vec.copy()\n",[100,3337,3338],{"class":407,"line":433},[100,3339,3340],{},"    indices = []\n",[100,3342,3343],{"class":407,"line":438},[100,3344,418],{"emptyLinePlaceholder":417},[100,3346,3347],{"class":407,"line":444},[100,3348,3349],{},"    for stage, codebook in enumerate(codebooks):\n",[100,3351,3352],{"class":407,"line":450},[100,3353,3354],{},"        quant_vec, idx = quantize_vector(residual, codebook)\n",[100,3356,3357],{"class":407,"line":456},[100,3358,3359],{},"        reconstruction += quant_vec\n",[100,3361,3362],{"class":407,"line":462},[100,3363,3364],{},"        indices.append(idx)\n",[100,3366,3367],{"class":407,"line":468},[100,3368,3369],{},"        residual = input_vec - reconstruction\n",[100,3371,3372],{"class":407,"line":474},[100,3373,3374],{},"        print(f\"Stage {stage+1} residual norm: {np.linalg.norm(residual):.4f}\")\n",[100,3376,3377],{"class":407,"line":479},[100,3378,418],{"emptyLinePlaceholder":417},[100,3380,3381],{"class":407,"line":485},[100,3382,3383],{},"    return reconstruction, indices\n",[1547,3385,3387],{"id":3386},"audio-demonstrations","Audio Demonstrations",[11,3389,3390],{},[75,3391,3392],{},"Original Audio",[11,3394,3395],{},[3396,3397],"audio",{"controls":417,"src":3398,"style":3399},"https:\u002F\u002Fassets.modelslab.com\u002Fgenerations\u002F296bb1a6-d6ad-43e3-bd7d-645dcba49b6d.wav","width: 100%; margin: 0.5rem 0;",[11,3401,3402],{},[75,3403,3404],{},"4 Codebooks Reconstruction",[11,3406,3407],{},[3396,3408],{"controls":417,"src":3409,"style":3399},"https:\u002F\u002Fassets.modelslab.com\u002Fgenerations\u002Fbea3a3ea-024e-435b-a978-41e6f3af9af4.wav",[11,3411,3412],{},[75,3413,3414],{},"8 Codebooks Reconstruction",[11,3416,3417],{},[3396,3418],{"controls":417,"src":3419,"style":3399},"https:\u002F\u002Fassets.modelslab.com\u002Fgenerations\u002F85bdaddd-0c28-4d5a-854f-633b7b042c2a.wav",[11,3421,3422],{},[75,3423,3424],{},"16 Codebooks Reconstruction",[11,3426,3427],{},[3396,3428],{"controls":417,"src":3429,"style":3399},"https:\u002F\u002Fassets.modelslab.com\u002Fgenerations\u002F72a29fcf-79ab-4dea-a4c6-53b7f3d21cd4.wav",[11,3431,3432],{},[75,3433,3434],{},"32 Codebooks Reconstruction",[11,3436,3437],{},[3396,3438],{"controls":417,"src":3439,"style":3399},"https:\u002F\u002Fassets.modelslab.com\u002Fgenerations\u002F3ec7e80d-1d18-449e-ba84-0b75034a36a6.wav",[21,3441,3443],{"id":3442},"bitrate-control-through-rvq","Bitrate Control Through RVQ",[11,3445,3446],{},[34,3447],{"alt":3448,"src":3449},"RVQ in EnCodec - Bitrate Control","\u002Frvq-in-encodec.png",[11,3451,3452],{},[14,3453,3454],{},"Figure 4: RVQ in EnCodec — Bitrate Control Through Multiple Quantization Stages",[11,3456,3457],{},"One of the biggest advantages of RVQ is fine-grained control over bitrate. By adjusting the number of quantization stages or the size of each codebook, we can trade off quality versus compression.",[11,3459,3460],{},"Meta's EnCodec paper demonstrated the practical power of this approach.",[11,3462,3463],{},[34,3464],{"alt":3465,"src":3466},"Meta's EnCodec Architecture","\u002Fmeta-encodec-arch.png",[11,3468,3469],{},[14,3470,3471],{},"Figure 5: Meta's EnCodec Architecture",[11,3473,3474],{},"The mathematical relationship shows exponential growth in representational capacity:",[11,3476,3477],{},[100,3478,3480,3512],{"className":3479},[103],[100,3481,3483],{"className":3482},[107],[109,3484,3485],{"xmlns":111},[113,3486,3487,3509],{},[116,3488,3489,3493,3495],{},[3490,3491,3492],"mtext",{},"Total patterns",[123,3494,148],{},[169,3496,3497,3499],{},[174,3498,176],{},[116,3500,3501,3504,3507],{},[119,3502,3503],{},"b",[123,3505,3506],{},"×",[119,3508,3057],{},[178,3510,3511],{"encoding":180},"\\text{Total patterns} = 2^{b \\times N}",[100,3513,3515,3538],{"className":3514,"ariaHidden":132},[185],[100,3516,3518,3522,3529,3532,3535],{"className":3517},[189],[100,3519],{"className":3520,"style":3521},[193],"height:0.8889em;vertical-align:-0.1944em;",[100,3523,3526],{"className":3524},[198,3525],"text",[100,3527,3492],{"className":3528},[198],[100,3530],{"className":3531,"style":279},[214],[100,3533,148],{"className":3534},[283],[100,3536],{"className":3537,"style":279},[214],[100,3539,3541,3544],{"className":3540},[189],[100,3542],{"className":3543,"style":699},[193],[100,3545,3547,3550],{"className":3546},[198],[100,3548,176],{"className":3549},[198],[100,3551,3553],{"className":3552},[225],[100,3554,3556],{"className":3555},[229],[100,3557,3559],{"className":3558},[234],[100,3560,3562],{"className":3561,"style":699},[238],[100,3563,3564,3567],{"style":382},[100,3565],{"className":3566,"style":247},[246],[100,3568,3570],{"className":3569},[251,252,253,254],[100,3571,3573,3576,3579],{"className":3572},[198,254],[100,3574,3503],{"className":3575},[198,199,254],[100,3577,3506],{"className":3578},[307,254],[100,3580,3057],{"className":3581,"style":3196},[198,199,254],[11,3583,3584,3585,3614,3615,3643],{},"where ",[100,3586,3588,3601],{"className":3587},[103],[100,3589,3591],{"className":3590},[107],[109,3592,3593],{"xmlns":111},[113,3594,3595,3599],{},[116,3596,3597],{},[119,3598,3503],{},[178,3600,3503],{"encoding":180},[100,3602,3604],{"className":3603,"ariaHidden":132},[185],[100,3605,3607,3611],{"className":3606},[189],[100,3608],{"className":3609,"style":3610},[193],"height:0.6944em;",[100,3612,3503],{"className":3613},[198,199]," is bits per stage and ",[100,3616,3618,3631],{"className":3617},[103],[100,3619,3621],{"className":3620},[107],[109,3622,3623],{"xmlns":111},[113,3624,3625,3629],{},[116,3626,3627],{},[119,3628,3057],{},[178,3630,3057],{"encoding":180},[100,3632,3634],{"className":3633,"ariaHidden":132},[185],[100,3635,3637,3640],{"className":3636},[189],[100,3638],{"className":3639,"style":796},[193],[100,3641,3057],{"className":3642,"style":3196},[198,199]," is the number of stages.",[21,3645,3647],{"id":3646},"exponential-moving-average-ema-codebook-update","Exponential Moving Average (EMA) Codebook Update",[11,3649,3650],{},"To stabilize training, each codeword is updated using an exponential moving average:",[11,3652,3653],{},[100,3654,3656,3735],{"className":3655},[103],[100,3657,3659],{"className":3658},[107],[109,3660,3661],{"xmlns":111},[113,3662,3663,3732],{},[116,3664,3665,3684,3686,3689,3692,3706,3708,3710,3712,3714,3716,3718,3720],{},[1039,3666,3667,3669,3671],{},[119,3668,139],{"mathvariant":655},[119,3670,142],{},[116,3672,3673,3675,3678,3680,3682],{},[123,3674,126],{"stretchy":125},[119,3676,3677],{},"t",[123,3679,1971],{},[174,3681,758],{},[123,3683,145],{"stretchy":125},[123,3685,148],{},[119,3687,3688],{},"α",[3490,3690,3691],{}," ",[1039,3693,3694,3696,3698],{},[119,3695,139],{"mathvariant":655},[119,3697,142],{},[116,3699,3700,3702,3704],{},[123,3701,126],{"stretchy":125},[119,3703,3677],{},[123,3705,145],{"stretchy":125},[123,3707,1971],{},[123,3709,126],{"stretchy":125},[174,3711,758],{},[123,3713,159],{},[119,3715,3688],{},[123,3717,145],{"stretchy":125},[3490,3719,3691],{},[135,3721,3722,3730],{},[3034,3723,3724,3727],{"accent":132},[119,3725,3726],{"mathvariant":655},"v",[123,3728,3729],{},"ˉ",[119,3731,142],{},[178,3733,3734],{"encoding":180},"\\mathbf{c}_i^{(t+1)} = \\alpha \\, \\mathbf{c}_i^{(t)} + (1 - \\alpha) \\, \\bar{\\mathbf{v}}_i",[100,3736,3738,3824,3906,3927],{"className":3737,"ariaHidden":132},[185],[100,3739,3741,3745,3815,3818,3821],{"className":3740},[189],[100,3742],{"className":3743,"style":3744},[193],"height:1.3217em;vertical-align:-0.2769em;",[100,3746,3748,3751],{"className":3747},[198],[100,3749,139],{"className":3750},[198,683],[100,3752,3754],{"className":3753},[225],[100,3755,3757,3806],{"className":3756},[229,230],[100,3758,3760,3803],{"className":3759},[234],[100,3761,3764,3776],{"className":3762,"style":3763},[238],"height:1.0448em;",[100,3765,3767,3770],{"style":3766},"top:-2.4231em;margin-left:0em;margin-right:0.05em;",[100,3768],{"className":3769,"style":247},[246],[100,3771,3773],{"className":3772},[251,252,253,254],[100,3774,142],{"className":3775},[198,199,254],[100,3777,3779,3782],{"style":3778},"top:-3.2198em;margin-right:0.05em;",[100,3780],{"className":3781,"style":247},[246],[100,3783,3785],{"className":3784},[251,252,253,254],[100,3786,3788,3791,3794,3797,3800],{"className":3787},[198,254],[100,3789,126],{"className":3790},[203,254],[100,3792,3677],{"className":3793},[198,199,254],[100,3795,1971],{"className":3796},[307,254],[100,3798,758],{"className":3799},[198,254],[100,3801,145],{"className":3802},[275,254],[100,3804,262],{"className":3805},[261],[100,3807,3809],{"className":3808},[234],[100,3810,3813],{"className":3811,"style":3812},[238],"height:0.2769em;",[100,3814],{},[100,3816],{"className":3817,"style":279},[214],[100,3819,148],{"className":3820},[283],[100,3822],{"className":3823,"style":279},[214],[100,3825,3827,3830,3834,3837,3897,3900,3903],{"className":3826},[189],[100,3828],{"className":3829,"style":3744},[193],[100,3831,3688],{"className":3832,"style":3833},[198,199],"margin-right:0.0037em;",[100,3835],{"className":3836,"style":215},[214],[100,3838,3840,3843],{"className":3839},[198],[100,3841,139],{"className":3842},[198,683],[100,3844,3846],{"className":3845},[225],[100,3847,3849,3889],{"className":3848},[229,230],[100,3850,3852,3886],{"className":3851},[234],[100,3853,3855,3866],{"className":3854,"style":3763},[238],[100,3856,3857,3860],{"style":3766},[100,3858],{"className":3859,"style":247},[246],[100,3861,3863],{"className":3862},[251,252,253,254],[100,3864,142],{"className":3865},[198,199,254],[100,3867,3868,3871],{"style":3778},[100,3869],{"className":3870,"style":247},[246],[100,3872,3874],{"className":3873},[251,252,253,254],[100,3875,3877,3880,3883],{"className":3876},[198,254],[100,3878,126],{"className":3879},[203,254],[100,3881,3677],{"className":3882},[198,199,254],[100,3884,145],{"className":3885},[275,254],[100,3887,262],{"className":3888},[261],[100,3890,3892],{"className":3891},[234],[100,3893,3895],{"className":3894,"style":3812},[238],[100,3896],{},[100,3898],{"className":3899,"style":303},[214],[100,3901,1971],{"className":3902},[307],[100,3904],{"className":3905,"style":303},[214],[100,3907,3909,3912,3915,3918,3921,3924],{"className":3908},[189],[100,3910],{"className":3911,"style":194},[193],[100,3913,126],{"className":3914},[203],[100,3916,758],{"className":3917},[198],[100,3919],{"className":3920,"style":303},[214],[100,3922,159],{"className":3923},[307],[100,3925],{"className":3926,"style":303},[214],[100,3928,3930,3933,3936,3939,3942],{"className":3929},[189],[100,3931],{"className":3932,"style":194},[193],[100,3934,3688],{"className":3935,"style":3833},[198,199],[100,3937,145],{"className":3938},[275],[100,3940],{"className":3941,"style":215},[214],[100,3943,3945,3978],{"className":3944},[198],[100,3946,3948],{"className":3947},[198,3096],[100,3949,3951],{"className":3950},[229],[100,3952,3954],{"className":3953},[234],[100,3955,3958,3967],{"className":3956,"style":3957},[238],"height:0.5812em;",[100,3959,3960,3963],{"style":3108},[100,3961],{"className":3962,"style":3112},[246],[100,3964,3726],{"className":3965,"style":3966},[198,683],"margin-right:0.016em;",[100,3968,3969,3972],{"style":3118},[100,3970],{"className":3971,"style":3112},[246],[100,3973,3975],{"className":3974,"style":3126},[3125],[100,3976,3729],{"className":3977},[198],[100,3979,3981],{"className":3980},[225],[100,3982,3984,4005],{"className":3983},[229,230],[100,3985,3987,4002],{"className":3986},[234],[100,3988,3990],{"className":3989,"style":239},[238],[100,3991,3993,3996],{"style":3992},"top:-2.55em;margin-left:-0.016em;margin-right:0.05em;",[100,3994],{"className":3995,"style":247},[246],[100,3997,3999],{"className":3998},[251,252,253,254],[100,4000,142],{"className":4001},[198,199,254],[100,4003,262],{"className":4004},[261],[100,4006,4008],{"className":4007},[234],[100,4009,4011],{"className":4010,"style":269},[238],[100,4012],{},[11,4014,2463],{},[2465,4016,4017,4147,4281],{},[1555,4018,4019,4117,4118],{},[100,4020,4022,4048],{"className":4021},[103],[100,4023,4025],{"className":4024},[107],[109,4026,4027],{"xmlns":111},[113,4028,4029,4045],{},[116,4030,4031],{},[1039,4032,4033,4035,4037],{},[119,4034,139],{"mathvariant":655},[119,4036,142],{},[116,4038,4039,4041,4043],{},[123,4040,126],{"stretchy":125},[119,4042,3677],{},[123,4044,145],{"stretchy":125},[178,4046,4047],{"encoding":180},"\\mathbf{c}_i^{(t)}",[100,4049,4051],{"className":4050,"ariaHidden":132},[185],[100,4052,4054,4057],{"className":4053},[189],[100,4055],{"className":4056,"style":3744},[193],[100,4058,4060,4063],{"className":4059},[198],[100,4061,139],{"className":4062},[198,683],[100,4064,4066],{"className":4065},[225],[100,4067,4069,4109],{"className":4068},[229,230],[100,4070,4072,4106],{"className":4071},[234],[100,4073,4075,4086],{"className":4074,"style":3763},[238],[100,4076,4077,4080],{"style":3766},[100,4078],{"className":4079,"style":247},[246],[100,4081,4083],{"className":4082},[251,252,253,254],[100,4084,142],{"className":4085},[198,199,254],[100,4087,4088,4091],{"style":3778},[100,4089],{"className":4090,"style":247},[246],[100,4092,4094],{"className":4093},[251,252,253,254],[100,4095,4097,4100,4103],{"className":4096},[198,254],[100,4098,126],{"className":4099},[203,254],[100,4101,3677],{"className":4102},[198,199,254],[100,4104,145],{"className":4105},[275,254],[100,4107,262],{"className":4108},[261],[100,4110,4112],{"className":4111},[234],[100,4113,4115],{"className":4114,"style":3812},[238],[100,4116],{}," is the codeword at iteration ",[100,4119,4121,4134],{"className":4120},[103],[100,4122,4124],{"className":4123},[107],[109,4125,4126],{"xmlns":111},[113,4127,4128,4132],{},[116,4129,4130],{},[119,4131,3677],{},[178,4133,3677],{"encoding":180},[100,4135,4137],{"className":4136,"ariaHidden":132},[185],[100,4138,4140,4144],{"className":4139},[189],[100,4141],{"className":4142,"style":4143},[193],"height:0.6151em;",[100,4145,3677],{"className":4146},[198,199],[1555,4148,4149,4252,4253],{},[100,4150,4152,4174],{"className":4151},[103],[100,4153,4155],{"className":4154},[107],[109,4156,4157],{"xmlns":111},[113,4158,4159,4171],{},[116,4160,4161],{},[135,4162,4163,4169],{},[3034,4164,4165,4167],{"accent":132},[119,4166,3726],{"mathvariant":655},[123,4168,3729],{},[119,4170,142],{},[178,4172,4173],{"encoding":180},"\\bar{\\mathbf{v}}_i",[100,4175,4177],{"className":4176,"ariaHidden":132},[185],[100,4178,4180,4184],{"className":4179},[189],[100,4181],{"className":4182,"style":4183},[193],"height:0.7312em;vertical-align:-0.15em;",[100,4185,4187,4218],{"className":4186},[198],[100,4188,4190],{"className":4189},[198,3096],[100,4191,4193],{"className":4192},[229],[100,4194,4196],{"className":4195},[234],[100,4197,4199,4207],{"className":4198,"style":3957},[238],[100,4200,4201,4204],{"style":3108},[100,4202],{"className":4203,"style":3112},[246],[100,4205,3726],{"className":4206,"style":3966},[198,683],[100,4208,4209,4212],{"style":3118},[100,4210],{"className":4211,"style":3112},[246],[100,4213,4215],{"className":4214,"style":3126},[3125],[100,4216,3729],{"className":4217},[198],[100,4219,4221],{"className":4220},[225],[100,4222,4224,4244],{"className":4223},[229,230],[100,4225,4227,4241],{"className":4226},[234],[100,4228,4230],{"className":4229,"style":239},[238],[100,4231,4232,4235],{"style":3992},[100,4233],{"className":4234,"style":247},[246],[100,4236,4238],{"className":4237},[251,252,253,254],[100,4239,142],{"className":4240},[198,199,254],[100,4242,262],{"className":4243},[261],[100,4245,4247],{"className":4246},[234],[100,4248,4250],{"className":4249,"style":269},[238],[100,4251],{}," is the mean of all encoder outputs assigned to codeword ",[100,4254,4256,4269],{"className":4255},[103],[100,4257,4259],{"className":4258},[107],[109,4260,4261],{"xmlns":111},[113,4262,4263,4267],{},[116,4264,4265],{},[119,4266,142],{},[178,4268,142],{"encoding":180},[100,4270,4272],{"className":4271,"ariaHidden":132},[185],[100,4273,4275,4278],{"className":4274},[189],[100,4276],{"className":4277,"style":2664},[193],[100,4279,142],{"className":4280},[198,199],[1555,4282,4283,4358],{},[100,4284,4286,4313],{"className":4285},[103],[100,4287,4289],{"className":4288},[107],[109,4290,4291],{"xmlns":111},[113,4292,4293,4310],{},[116,4294,4295,4297,4299,4301,4304,4306,4308],{},[119,4296,3688],{},[123,4298,658],{},[123,4300,1345],{"stretchy":125},[174,4302,4303],{},"0",[123,4305,133],{"separator":132},[174,4307,758],{},[123,4309,145],{"stretchy":125},[178,4311,4312],{"encoding":180},"\\alpha \\in [0, 1)",[100,4314,4316,4334],{"className":4315,"ariaHidden":132},[185],[100,4317,4319,4322,4325,4328,4331],{"className":4318},[189],[100,4320],{"className":4321,"style":679},[193],[100,4323,3688],{"className":4324,"style":3833},[198,199],[100,4326],{"className":4327,"style":279},[214],[100,4329,658],{"className":4330},[283],[100,4332],{"className":4333,"style":279},[214],[100,4335,4337,4340,4343,4346,4349,4352,4355],{"className":4336},[189],[100,4338],{"className":4339,"style":194},[193],[100,4341,1345],{"className":4342},[203],[100,4344,4303],{"className":4345},[198],[100,4347,133],{"className":4348},[210],[100,4350],{"className":4351,"style":215},[214],[100,4353,758],{"className":4354},[198],[100,4356,145],{"className":4357},[275]," is the momentum parameter (typically 0.99)",[11,4360,4361,4362,4392],{},"A higher ",[100,4363,4365,4379],{"className":4364},[103],[100,4366,4368],{"className":4367},[107],[109,4369,4370],{"xmlns":111},[113,4371,4372,4376],{},[116,4373,4374],{},[119,4375,3688],{},[178,4377,4378],{"encoding":180},"\\alpha",[100,4380,4382],{"className":4381,"ariaHidden":132},[185],[100,4383,4385,4389],{"className":4384},[189],[100,4386],{"className":4387,"style":4388},[193],"height:0.4306em;",[100,4390,3688],{"className":4391,"style":3833},[198,199]," means slower, smoother updates; lower values adapt faster but can be noisy. This EMA rule helps the codebook evolve continuously, reducing abrupt jumps and preventing codeword collapse.",[18,4394],{},[21,4396,4398],{"id":4397},"references","References",[1552,4400,4401,4416,4430,4444,4457],{},[1555,4402,4403,4406,4407,4410,4411],{},[75,4404,4405],{},"Défossez, A., Copet, J., Synnaeve, G., & Adi, Y."," (2022). ",[14,4408,4409],{},"High Fidelity Neural Audio Compression",". ",[51,4412,4415],{"href":4413,"rel":4414},"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2210.13438",[55],"arXiv:2210.13438",[1555,4417,4418,4421,4422,4410,4425],{},[75,4419,4420],{},"Zeghidour, N., et al."," (2021). ",[14,4423,4424],{},"SoundStream: An End-to-End Neural Audio Codec",[51,4426,4429],{"href":4427,"rel":4428},"https:\u002F\u002Fresearch.google\u002Fpubs\u002Fsoundstream-an-end-to-end-neural-audio-codec\u002F",[55],"Google Research",[1555,4431,4432,4435,4436,4435,4439],{},[75,4433,4434],{},"AssemblyAI."," ",[14,4437,4438],{},"What is Residual Vector Quantization?",[51,4440,4443],{"href":4441,"rel":4442},"https:\u002F\u002Fwww.assemblyai.com\u002Fblog\u002Fwhat-is-residual-vector-quantization",[55],"assemblyai.com",[1555,4445,4446,4435,4449,4410,4452],{},[75,4447,4448],{},"Notes by Lex.",[14,4450,4451],{},"Residual Vector Quantisation",[51,4453,4456],{"href":4454,"rel":4455},"https:\u002F\u002Fnotesbylex.com\u002Fresidual-vector-quantisation",[55],"notesbylex.com",[1555,4458,4459,4435,4462,4410,4465],{},[75,4460,4461],{},"Yannic Kilcher.",[14,4463,4464],{},"High Fidelity Neural Audio Compression (EnCodec Explained)",[51,4466,4469],{"href":4467,"rel":4468},"https:\u002F\u002Fyoutu.be\u002FXt9S74BHsvc",[55],"YouTube",[4471,4472,4473],"style",{},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}",{"title":401,"searchDepth":414,"depth":414,"links":4475},[4476,4477,4478,4479,4480,4483,4486,4487,4488],{"id":23,"depth":414,"text":24},{"id":45,"depth":414,"text":46},{"id":86,"depth":414,"text":87},{"id":529,"depth":414,"text":530},{"id":636,"depth":414,"text":637,"children":4481},[4482],{"id":1549,"depth":421,"text":1550},{"id":1918,"depth":414,"text":1919,"children":4484},[4485],{"id":3386,"depth":421,"text":3387},{"id":3442,"depth":414,"text":3443},{"id":3646,"depth":414,"text":3647},{"id":4397,"depth":414,"text":4398},[4490,4491,56],"speech-synthesis","codecs","2025-11-08","Exploring the role of vector quantization in audio compression and its uses in neural audio codecs.","md",false,{},"\u002Fblog\u002Fneural-audio-codec-rvq",{"title":5,"description":4493},"blog\u002Fneural-audio-codec-rvq",null,"IZ0owy_BfWVwsBEoOkXEB4dH-9F1gU4UPXg7OfxvROk",1775296369322]