How Vaani works
An overview for procurement, IT, and programme leaders evaluating multilingual meetings for recruitment, skilling, and distributed operations across India — without getting lost in implementation detail.
Vaani is not a bolt-on caption layer. It is an end-to-end pipeline — from room creation through real-time translation to stored transcripts your organisation owns.
01
Host creates the room
A signed-in host sets title, visibility, and whether live translation is enabled. A unique room code and link are generated instantly — no downloads for anyone.
Enterprise: SSO-ready auth, role-based host controls, usage reporting per organisation.
02
Participants choose their language
Each person enters a display name and selects their subtitle language before joining. Guests need no account. The room learns which languages are actually present.
Government: one trainer, participants across states — each reads subtitles in their scheduled language.
03
Speech becomes subtitles in real time
Each speaker's voice is captured, understood in context — including code-mixed Indian speech — translated only into languages present in the room, and delivered as live subtitles on each screen.
Designed for conversational pace: subtitles arrive while people are still speaking, typically within two seconds.
04
Meeting artifacts are retained
Transcripts, per-language translations, participant records, and usage summaries are stored in a secure environment your organisation controls. Exports and AI summaries are available on paid plans.
Configurable retention policies for enterprise and government compliance requirements.
Real-time pipeline
Each active speaker's audio track is processed independently. The pipeline is streaming — subtitles appear while the speaker is still talking, not after they finish.
How we budget time across the pipeline so subtitles feel conversational, not delayed.
| Stage | Target |
|---|---|
| Speech detection & segmentation | 200–400ms |
| Speech recognition (code-mixed Indian speech) | 300–600ms |
| Machine translation (routed per room) | 100–300ms |
| Network delivery & on-screen render | 100–200ms |
| Total subtitle latency (design target) | < 2s |
Platform design
Vaani separates three concerns: real-time video and audio, a dedicated subtitle processing layer, and persistent storage for transcripts and accounts. Each layer can be deployed on India-based infrastructure that meets your residency and procurement requirements.
Enterprise deployments support dedicated environments, custom retention, and SSO. Detailed architecture documentation is available under NDA for RFP and security reviews.
India-hosted infrastructure
Media, application, and data layers run on India-based cloud infrastructure. Video and audio route through dedicated media servers — not default overseas relays used by generic conferencing tools.
Cost-controlled translation
Dynamic routing translates each utterance once per target language in the room, not once per participant. At 50 participants with 3 languages, you pay for 3 translation paths — not 50.
Code-mixed speech by design
Our speech models are tuned for how India actually talks — Hinglish, Tanglish, Benglish, and other everyday blends — not textbook monolingual speech.
Audit-ready data model
Every join, transcript segment, translation, and cost line is persisted with timestamps. Built for procurement teams that need traceability, not black-box AI.
सरकार और स्किलिंग
One session. Many states. Every participant follows in their own language — without multiplying trainers, travel, or content production.
Skill India & NSDC partners
A master trainer conducts one skilling session. Participants in Tamil Nadu, Uttar Pradesh, and Maharashtra each read live subtitles in Tamil, Hindi, and Marathi — without repeating the session three times.
Outcome: Higher completion rates, lower trainer travel cost, wider geographic reach per cohort.
State public service commissions
Panel members ask questions in English. Candidates respond in their strongest language. Subtitles let evaluators follow the substance of answers without penalising vocabulary.
Outcome: Fairer assessment of skill, not fluency in a second language.
National banks & BFSI L&D
Regulatory training rolled out to 10,000 branch staff. One compliance module, eleven subtitle languages, consistent messaging without eleven separate recordings.
Outcome: Faster rollout, single source of truth, verifiable attendance and transcript records.
| Capability | Vaani | Typical VC tool |
|---|---|---|
| Live subtitles in 11 Indian scripts | ✓ | Limited / addon |
| Code-mixed speech (Hinglish etc.) | ✓ | — |
| Translate only languages in room | ✓ | — |
| Dedicated India-hosted media | ✓ | — |
| Guest join without account | ✓ | ✓ |
| Per-org usage & cost reporting | ✓ | Varies |
| Data residency in India | Pilot · Enterprise | Typically US/EU |
We are transparent about what is live today and what ships with Enterprise. Ask us for a security questionnaire — we respond to government and BFSI RFPs.
Read Privacy Policy →We will demo Vaani with your languages, your use case, and your compliance questions — not a generic product tour.