How Vaani works

From spoken voice to understood meaning — in under two seconds.

An overview for procurement, IT, and programme leaders evaluating multilingual meetings for recruitment, skilling, and distributed operations across India — without getting lost in implementation detail.

Four moments from setup to insight

Vaani is not a bolt-on caption layer. It is an end-to-end pipeline — from room creation through real-time translation to stored transcripts your organisation owns.

01

Host creates the room

A signed-in host sets title, visibility, and whether live translation is enabled. A unique room code and link are generated instantly — no downloads for anyone.

Enterprise: SSO-ready auth, role-based host controls, usage reporting per organisation.

02

Participants choose their language

Each person enters a display name and selects their subtitle language before joining. Guests need no account. The room learns which languages are actually present.

Government: one trainer, participants across states — each reads subtitles in their scheduled language.

03

Speech becomes subtitles in real time

Each speaker's voice is captured, understood in context — including code-mixed Indian speech — translated only into languages present in the room, and delivered as live subtitles on each screen.

Designed for conversational pace: subtitles arrive while people are still speaking, typically within two seconds.

04

Meeting artifacts are retained

Transcripts, per-language translations, participant records, and usage summaries are stored in a secure environment your organisation controls. Exports and AI summaries are available on paid plans.

Configurable retention policies for enterprise and government compliance requirements.

Real-time pipeline

How speech becomes subtitles

Each active speaker's audio track is processed independently. The pipeline is streaming — subtitles appear while the speaker is still talking, not after they finish.

Latency budget

How we budget time across the pipeline so subtitles feel conversational, not delayed.

StageTarget
Speech detection & segmentation200–400ms
Speech recognition (code-mixed Indian speech)300–600ms
Machine translation (routed per room)100–300ms
Network delivery & on-screen render100–200ms
Total subtitle latency (design target)< 2s

Platform design

Secure media. Indian languages. Your data.

Vaani separates three concerns: real-time video and audio, a dedicated subtitle processing layer, and persistent storage for transcripts and accounts. Each layer can be deployed on India-based infrastructure that meets your residency and procurement requirements.

Enterprise deployments support dedicated environments, custom retention, and SSO. Detailed architecture documentation is available under NDA for RFP and security reviews.

Why procurement teams choose Vaani

India-hosted infrastructure

Media, application, and data layers run on India-based cloud infrastructure. Video and audio route through dedicated media servers — not default overseas relays used by generic conferencing tools.

Cost-controlled translation

Dynamic routing translates each utterance once per target language in the room, not once per participant. At 50 participants with 3 languages, you pay for 3 translation paths — not 50.

Code-mixed speech by design

Our speech models are tuned for how India actually talks — Hinglish, Tanglish, Benglish, and other everyday blends — not textbook monolingual speech.

Audit-ready data model

Every join, transcript segment, translation, and cost line is persisted with timestamps. Built for procurement teams that need traceability, not black-box AI.

सरकार और स्किलिंग

Built for national programmes at scale

One session. Many states. Every participant follows in their own language — without multiplying trainers, travel, or content production.

Skill India & NSDC partners

A master trainer conducts one skilling session. Participants in Tamil Nadu, Uttar Pradesh, and Maharashtra each read live subtitles in Tamil, Hindi, and Marathi — without repeating the session three times.

Outcome: Higher completion rates, lower trainer travel cost, wider geographic reach per cohort.

State public service commissions

Panel members ask questions in English. Candidates respond in their strongest language. Subtitles let evaluators follow the substance of answers without penalising vocabulary.

Outcome: Fairer assessment of skill, not fluency in a second language.

National banks & BFSI L&D

Regulatory training rolled out to 10,000 branch staff. One compliance module, eleven subtitle languages, consistent messaging without eleven separate recordings.

Outcome: Faster rollout, single source of truth, verifiable attendance and transcript records.

Vaani vs generic video tools

CapabilityVaaniTypical VC tool
Live subtitles in 11 Indian scriptsLimited / addon
Code-mixed speech (Hinglish etc.)
Translate only languages in room
Dedicated India-hosted media
Guest join without account
Per-org usage & cost reportingVaries
Data residency in IndiaPilot · EnterpriseTypically US/EU

Security & data posture

We are transparent about what is live today and what ships with Enterprise. Ask us for a security questionnaire — we respond to government and BFSI RFPs.

Read Privacy Policy →
  • Encrypted connections for all web traffic
  • Secure sign-in for hosts (enterprise SSO available on request)
  • Session-based authentication with protected cookies
  • Host-only controls: remove participant, toggle translation, room settings
  • Unlisted and private room visibility modes
  • Meeting records stored in your controlled environment — not locked in a third-party cloud
  • Configurable data retention (Enterprise)
  • SAML SSO & audit log export (Enterprise roadmap)

Ready for a tailored walkthrough?

We will demo Vaani with your languages, your use case, and your compliance questions — not a generic product tour.