source code of our paper Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval
The training and inference data can be obtained from the NRCCR, and we use the videochat2 to generate the image description.
cd LECCR
sh run_multi30k.sh
sh run_mscoco.sh
sh run_video.sh
The codes are modified from NRCCR and CCLM.
If you find the package useful, please consider citing our paper:
@inproceedings{wang2024multimodal,
title={Multimodal llm enhanced cross-lingual cross-modal retrieval},
author={Wang, Yabing and Wang, Le and Zhou, Qiang and Wang, Zhibin and Li, Hao and Hua, Gang and Tang, Wei},
booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
pages={8296--8305},
year={2024}
}