English General multi-modal audio/video dataset

High-Quality Multi-Modal Face Dataset to Empower Face Recognition Models

Professional-Grade Dataset with Multi-Angle Face Images, Videos and Audio

  • Leveraging years of expertise in image data processing, Surfing Tech provides high-quality datasets tailored for training and testing various facial recognition tasks to AI practitioners. The datastes contains 300 subjects, with each reciting 300 sentences. The recording setup includes 1 lapel mic for audio, 1 smartphone for audio, and another smartphone for video, which are manually synced.

  • The audio captured by smartphone has a sampling rate of 44.1KHz at 16bit depth. The video is recorded at 1440×1080 resolution at 60FPS frame rate. The audio from lapel mic is also sampled at 44.1KHz with 16bit depth. The three modalities are aligned manually into synchronous audio-visual data.

  • Capturing speech and facial expressions across a range of emotions, this dataset enables building robust multi-angle face recognition models with improved adaptability. 

  • We welcome AI developers to use our face dataset solutions for expedited iterations and upgrades of facial recognition capabilities.

Data NameEnglish General multi-modal audio/video dataset
IPR OwnershipSurfingtech
Quantity100 ID
EthnicityEnglish Speaker
Captured Devicesmartphone, mic
Details (each person)each subject speak 300 sentence. There are 3 devices:  1 smartphone recording video+1 smartphone recording audio+1 mic recording audio, and aligned manually.