I’m Anindya Mondal, a PhD candidate at the Surrey Institute for People-Centred AI, CVSSP, University of Surrey. My research addresses the challenges of representation learning in real-world settings by integrating auxiliary signals across modalities. I work on developing algorithms for vision–language integration for robust action recognition, object counting, and text-to-image synthesis, with a strong emphasis on practical solutions for dynamic scene interpretation and generation.