All
Search
Images
Videos
Shorts
Maps
News
More
Shopping
Flights
Travel
Notebook
Report an inappropriate content
Please select one of the options below.
Not Relevant
Offensive
Adult
Child Sexual Abuse
LLM Efficient
Speculative Decoding
Speculative Decoding LLMs
Explained
Speculative Decoding
Eagle
Speculative Decoding
Vllm
Speculative Decoding
What's
Speculative Decoding
Speculative Decoding
YouTube
Self
Speculative Decoding
LLM
Draft Model Speculative
Speculative Decoding
Eagle 2
Speculative Decoding
Draft Model
Feed Time to a Local
LLM
Salam 119 Ai Decoded
K80 LLM
Inference
YouTube
LLMs
LLM
Split Inference
What Is
Speculative Execution
La Conception
Speculative
Spitransvergexk
Sqampling in Lmmqs
Slang What Is
Length
All
Short (less than 5 minutes)
Medium (5-20 minutes)
Long (more than 20 minutes)
Date
All
Past 24 hours
Past week
Past month
Past year
Resolution
All
Lower than 360p
360p or higher
480p or higher
720p or higher
1080p or higher
Source
All
Dailymotion
Vimeo
Metacafe
Hulu
VEVO
Myspace
MTV
CBS
Fox
CNN
MSN
Price
All
Free
Paid
Clear filters
SafeSearch:
Moderate
Strict
Moderate (default)
Off
Filter
LLM Efficient
Speculative Decoding
Speculative Decoding LLMs
Explained
Speculative Decoding
Eagle
Speculative Decoding
Vllm
Speculative Decoding
What's
Speculative Decoding
Speculative Decoding
YouTube
Self
Speculative Decoding
LLM
Draft Model Speculative
Speculative Decoding
Eagle 2
Speculative Decoding
Draft Model
Feed Time to a Local
LLM
Salam 119 Ai Decoded
K80 LLM
Inference
YouTube
LLMs
LLM
Split Inference
What Is
Speculative Execution
La Conception
Speculative
Spitransvergexk
Sqampling in Lmmqs
Slang What Is
How to Quadruple LLM Decoding Performance with Speculative Decoding (SpD) and Microscaling (MX) Formats on Qualcomm® Cloud AI 100
Aug 1, 2024
qualcomm.com
Speculative Decoding — Think Fast⚡, Then Think Right✅
Apr 13, 2025
substack.com
Faster LLMs: Accelerate Inference with Speculative Decoding
11 months ago
ibm.com
0:18
Introducing LM Studio 0.3.10 with 🔮 Speculative Decoding!It's an LLM inferencing technique that can speed up token generation by up to 1.5x-3x in some cases 🏎️💨- Supported for both GGUF and… | LM Studio | 10 comments
10 views
Feb 19, 2025
linkedin.com
13:05
SPECULATIVE DECODING 🚀 Cómo ACELERAR tus Modelos de IA con un Modelo Borrador
3 views
2 weeks ago
YouTube
Nichonauta
17:15
Multi-Token Prediction (MTP): Accelerating Local Models with no Quality Loss
1.4K views
1 week ago
YouTube
Onchain AI Garage
6:13
Speculative Decoding: Make AI 2-3x Faster for Free | Tech Decoded
3 views
1 month ago
YouTube
Toc am
8:37
Multi-Token Prediction: Why Your GPU Runs LLMs 3x Faster
4 views
2 weeks ago
YouTube
Devsplainers
3:08
What is Speculative Decoding ?
38 views
2 weeks ago
YouTube
DeepManim
13:10
LM STUDIO 🚀 How to SPEED UP your models with Speculative Decoding
55 views
3 weeks ago
YouTube
Nichonauta
7:57
Google Made Gemma 4 Three Times Faster — Then Hid The Best Part
6 views
1 week ago
YouTube
Digital Dreamscapes
1:09
This AI Trick Gives You 3x Speed For FREE
98 views
1 month ago
YouTube
The AI Century
40:19
Speculation is all you need: Intro to Speculative Decoding for High Performance Inference
753 views
2 months ago
YouTube
Modal
8:27
600 Toks/Second Gemma4-26B —The Setting That Actually Wins (vLLM + Dflash Speculative Decoding)
3.4K views
2 weeks ago
YouTube
Tech-Practice
5:04
Speculative Decoding: 2-3x Faster LLMs for Free
1 views
1 month ago
YouTube
The AI Century
12:35
LLAMA CPP 🚀 Cómo optimizar tu IA local para obtener MÁXIMA VELOCIDAD
129 views
2 weeks ago
YouTube
Nichonauta
23:40
Speculative Speculative Decoding: How to Parallelize Drafting and ... for 2x Faster LLM Inference
178 views
2 months ago
YouTube
Xiaol.x
1:25
Stop LLM Lag: The Secret to 1.4x Faster AI (ConfLayers) #Shorts
3 weeks ago
YouTube
CollapsedLatents
11:53
Transformers, Tokens, and Temperature - LLMs From Scratch
415 views
1 month ago
YouTube
Decode Agent
13:54
[IDSL Seminar'26] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
5 days ago
YouTube
IDSL
0:38
3.5K+ Stars • AI/ML | DFlash — Faster LLM Inference via Block Diffusion #shorts
1.1K views
2 weeks ago
YouTube
neural-nexus
0:23
DFlash: Faster LLM Inference with Speculative Decoding
100 views
1 week ago
YouTube
OnlyCS
0:31
Speculative Decoding • LLM Acceleration Patterns
1 views
1 month ago
YouTube
Technical Interview Essentials A–Z
0:59
Speculative Decoding explained in Hindi #aiengineering #datascience #llm #mustdo Interview Question
24 views
4 months ago
YouTube
Learn AI with RC
0:48
5 AI Terms Devs Are Quietly Searching More — April 2026
194 views
3 weeks ago
YouTube
Colony-AI
6:29
Inference Optimization: Making AI Faster & Cheaper (Latency, Throughput & GPUs)
56 views
2 months ago
YouTube
wecite
12:45
Speculative Decoding & Inference Speed — 2-3x Faster LLMs With Zero Quality Loss
2 weeks ago
YouTube
Jeff Heidelberger
10:14
MLX India Community Meetup 1 | Boosting local model performance - Speculative decoding with DFlash
4 views
1 week ago
YouTube
Conscious Engines
7:17
DFlash Deep Dive: Block Diffusion Makes LLM Inference 6x Faster
2 months ago
YouTube
Enchanted Storytime
0:26
Researchers found a way to make LLMs 8.5x faster!(without compromising accuracy)Speculative decoding is quite an effective way to address the single-token bottleneck in traditional LLM inference.A small "draft" model first generates the next several tokens, then the large model verifies all of them at once in a single forward pass.If a token at any position is wrong, you keep everything before it and restart from there. This never does worse than normal decoding.But current drafters in Speculati
10K views
1 week ago
x.com
Avi Chawla
See more
More like this
Feedback