Voice Order Entry in Urdu and Arabic: What Happens When You Stop Forcing Reps to Type in English

By Sufyan · 2026-06-09 · 4 min read

Last month I was sitting in a kiryana store in Gulberg, Lahore. Watching a sales rep from a biscuit company try to punch in an order on his phone. The shopkeeper was rattling off SKUs in Urdu — "do carton chocolate wala, aik half-size, aur woh naya flavor jo pichli baar laye thay" — and the rep was scrolling, tapping, scrolling. Took him almost 4 minutes for an order that probably had 11 line items.

I counted. Because I'm annoying like that.

The whole interaction made me realize something I'd been ignoring for too long. We've been building field sales tools for years now assuming English is the default, with maybe some Arabic UI as a translation layer slapped on top. But the actual conversation happening between a rep and a shopkeeper in Karachi or Jeddah or Muscat? It's almost never in English. And forcing reps to mentally translate from Urdu or Arabic into English search terms while a busy shopkeeper waits — that's just bad design.

So we shipped voice order entry in Urdu and Arabic earlier this year. Honestly, I underestimated how much it would change things.

The translation tax nobody talks about

Here's what nobody puts in a pitch deck. When a rep speaks Urdu or Arabic natively but the app forces English-only input, there's a cognitive tax on every single order. The shopkeeper says "laal wala sauce, chhoti bottle." The rep has to translate that to "red chilli sauce 200ml" in his head, then type it, then deal with autocorrect mangling it, then scroll through search results.

Multiply that by 35 outlets a day. Multiply that by 60 reps. You're losing hours.

One of our customers in Sharjah — they distribute snacks and beverages across the Northern Emirates — measured this before and after. Average time per outlet visit dropped from 7 minutes 20 seconds to 3 minutes 50 seconds. Almost half. And the reps were happier because they weren't doing the translation gymnastics anymore.

The other thing that happens: order accuracy goes up. When a rep is typing fast on a 6-inch screen between a fridge and a counter, they fat-finger SKUs constantly. Voice input — when it actually understands the dialect — gets it right more often than thumbs do.

Why generic voice tools don't cut it

I'll admit, when we first started building this, I thought we'd just plug into one of the big cloud speech APIs and call it a day. That didn't work. Here's why.

FMCG retail vocabulary is weird. Reps and shopkeepers use brand names, local slang, half-English half-Urdu mashups, regional product names. A general-purpose Urdu speech model trained on news broadcasts has no idea what "Tapal Danedar family pack" sounds like in a noisy shop with a fan running and a kid crying in the background. It just guesses, badly.

Same problem with Arabic, but worse — because the dialects matter so much. Gulf Arabic in Riyadh isn't the same as what you hear in Manama or Kuwait City. Egyptian sales reps working in Dubai will pronounce SKUs differently than Saudi nationals. A one-size-fits-all model trained on Modern Standard Arabic will fail in actual shops.

So we trained on FMCG-specific terminology. Lots of it. Brand names, pack sizes, flavors, regional product nicknames. And we let each distributor's catalog become part of the recognition layer so the system gets smarter about their SKUs specifically. An Urdu sales app or Arabic field sales tool that doesn't do this is just frustrating to use.

The other thing — and this took us a while to figure out — is that reps don't speak in clean sentences. They mumble, they interrupt themselves, they say "nahi nahi, dou nahi, teen carton" mid-order. The system has to handle corrections gracefully. If you have to start over every time you misspeak, you're back to typing.

What this actually changes on the ground

A few things I didn't fully predict:

Older reps adopted it faster than younger ones. I assumed it'd be the opposite. But the 45-year-old rep who's been doing this route for 12 years and knows every shopkeeper by name? He hates typing. Loves talking. Voice felt natural to him immediately. The 24-year-old who grew up on smartphones was actually a bit slower to switch over.

Literacy stopped being a barrier. We have customers in interior Sindh and rural Punjab where some reps aren't fully comfortable reading English product names on a screen. Voice solved that overnight. They speak the order the way the shopkeeper said it, the system finds the SKU. No reading required.

Shopkeepers started trusting the process more. Sounds small but it matters. When a rep is staring at a phone for 4 minutes, the shopkeeper gets antsy and starts second-guessing. When the rep is having a conversation and the order gets entered as they talk, it feels normal. Like the old paper-and-pen days but faster.

And the data quality got better. Because reps are entering orders during the conversation instead of trying to remember and re-enter later, you get fewer "I'll fix this at the warehouse" corrections. Those corrections are where bad data lives.

Look, I'm biased — I run Zivni and we sell this. But even if you go with FieldAssist or BeatRoute or build something internal, please make sure voice input in the actual local language is on your list. Not a translation overlay. Not English-only with an Arabic menu. Real voice recognition in the language the rep and shopkeeper are already speaking.

The FMCG sales rep in Faisalabad or Dammam isn't going to switch languages for your software. Your software has to meet them where they already are.

Anyone else seeing the same shift on their teams? Curious what time savings you're measuring.