Mingfei Chen

Mingfei Chen

Research Assistant

Huazhong University of Science and Technology


Mingfei Chen received his B.S. degree from the Huazhong University of Science and Technology of China in 2020. She is currently working on Computer Vision, espically in Online Multi-object Tracking (MOT), Human-object Interaction (HOI) and Cross-modal Video Retrieval (Vision Language).

Download my resumé.

  • Computer Vision
  • Visual Relation Detection
  • Video Analysis
  • MS, Electronic Information and Communications, 2020 ~ now

    Huazhong University of Science and Technology

  • BSc in Computer Science, 2016 ~ 2020

    Huazhong University of Science and Technology


Research Intern
Jul 2020 – Present Beijing, China

Project: Human-object Interaction (HOI), collabaration with Beihang University:

  • Formulated HOI detection as a set prediction problem as the primary researcher. The new formulation breaks the instance-centric and location limitations of the existing methods.
  • Proposed a novel one-stage HOI framework with transformer to adaptively aggregate the most suitable features.
  • Designed an instance-aware attention module to introduce the instance information into the interaction branch.
  • Without introducing any extra features, our method achieves 31% relative improvement over the second-best one-stage method on the HICO-DET dataset especially.

Project: Online Multi-object Tracking (MOT), collabaration with University of Washington:

  • Addressed the challenges of the online multi-object tracking problem as the primary researcher.
Research Intern
Sep 2019 – Apr 2020 Shenzhen, China
  • Reconstructed the hand pose detection network with a lightweight backbone. Finetuned and validated the new model based on millions of real-life user data, ensuring the high run speed while maintaining the comparatively robust detection precision.
  • Used foreground/background segmentation and human detection to discover all the human bodies in the video.
  • Applied guided filter, detection to improve the segmentation performance, especially under distant multi-person scenarios.
Research Assistant
University at Buffalo-SUNY and Chinese University of Hong Kong, Shenzhen
Jul 2019 – Nov 2019 Shenzhen, China

Project: Cross-modal Video Retrieval (Vision Language):

  • Addressed the natural language video retrieval efficiency and effectiveness problem as the primary researcher.
  • Devised a temporal anchor-free structure that performed retrieval directly on each temporal location within the target region. Built a top-down pyramid structure to make use of diverse temporal receptive fields, and a dilated convolutional module to integrate vision-language features more comprehensively.
  • The new method reduces retrieval time by a factor of 5 and outperforms previous work by 10% on retrieval accuracy.

Recent Publications

Quickly discover relevant content by filtering publications.
(0001). Reformulating HOI Detection as Adaptive Set Prediction. Accepted by CVPR 2021.

Cite Code