You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Would you mind adding 2 papers about video-text retrieval.
Paper 1: Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning
Accepted at ECCV 2024.
It leverages LLaVA to increase the scale of training data to video-text retrieval. The approach is to forward the concatenated frames of a video to LLaVA to generate the caption for the video.
Hi,
Thank you for your wonderful survey!
Would you mind adding 2 papers about video-text retrieval.
Paper 1: Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning
Accepted at ECCV 2024.
It leverages LLaVA to increase the scale of training data to video-text retrieval. The approach is to forward the concatenated frames of a video to LLaVA to generate the caption for the video.
Paper link: https://arxiv.org/abs/2407.03788
Code link: https://github.com/nguyentthong/meta_optimized_angular_margin_contrastive_lvlm
Paper 2: Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Accepted at ACL 2024 as Findings.
This paper summarizes video-text retrieval methods from model architecture, model training, and data perspectives.
Paper link: https://arxiv.org/abs/2406.05615
Code link: https://github.com/nguyentthong/video-language-understanding
Thanks a lot!
The text was updated successfully, but these errors were encountered: