• Fourier-LLaVA: Compressing Vision Tokens in the Frequency Domain for Large Vision-Language Models
    Huanyu Wang, Jushi Kai, Haoli Bai, Bo Jiang, Zhouhan Lin
    🔗 N/A 📄 N/A
  • LoMo: Longer and More Videos Benchmark for Understanding and Temporal Grounding Tasks
    Chengyang Hu, Xinyu Zhou, Huanyu Wang, Danyu Shen, Ran Yi, Mengtian Li, Lizhuang Ma
    NeurIPS 2025 Datasets and Benchmarks Track Submission
    🔗 N/A 📄 N/A