TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs

Abstract

Speculative decoding (SD) has proven effective in accelerating LLM inference by swiftly generating draft tokens and verifying them in parallel. However, SD remains largely unexplored for Large Vision Language Models (LVLMs), advanced LLMs capable of processing both image and text prompts. To address this gap, we first benchmark existing drafting methods for LVLMs across diverse scenarios and observe that methods using small draft models show scenario-specific performance fluctuations. Motivated by these findings, we propose Test-time Adaptive Batched Ensemble Draft (TABED), which dynamically ensembles multiple drafts obtained via batch inference by leveraging measurable deviations of the drafts from past ground truth available in SD setting. Across diverse input scenarios, TABED achieves an average robust speedup of 1.8x and a 5% improvement compared to individual draftings, though it does not incur additional costs during training (i.e., training-free) or inference. To further enhance its extensibility, we also explore and incorporate alternative draftings using image pooling and captioning. Our method maintains seamless compatibility with existing LVLM acceleration techniques, and we open-source custom-trained draft LVLMs to ensure reproducibility.

Checkpoints

In the Google-drive link, the following model checkpoints are released:

LLaVA-1.5-68m
LLaVA-OV-68m
LLaVA-1.5-160m
LLaVA-1.5-290m

Coming Soon

Inference code for TABED for each configuration

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs

Abstract

Checkpoints

Coming Soon

About

Releases

Packages

License

furiosa-ai/TABED

Folders and files

Latest commit

History

Repository files navigation

TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs

Abstract

Checkpoints

Coming Soon

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages