AI Shelf Photo Analysis: What Image Recognition Can (and Can't) Do Yet

By Sufyan · 2026-05-05 · 5 min read

Last March, a beverage distributor in Lahore sent me a photo from a kiryana store. Six brands stacked on a shelf, half of them obscured by a hanging snack rack, lighting that looked like a 90s music video. He asked: "Can your AI count my SKUs in this?"

Honest answer? Mostly yes. About 87% accuracy on that specific image. But the two SKUs it missed were the two he cared about most.

That's the thing nobody tells you about AI retail audit tools. The demo videos are gorgeous. The reality, in a sweaty general trade outlet in Karachi or Sharjah, is messier. So let me walk through what shelf photo recognition can actually do right now, where it falls flat, and how I'd use it if I were running a field team today.

What the tech can genuinely do well

Product detection on clean, well-lit shelves is largely a solved problem. If your rep takes a straight-on photo of a modern trade shelf — say, a Carrefour aisle in Dubai with proper lighting — a good model will hit 92-95% SKU recognition. It'll tell you facings, share of shelf, planogram compliance, and out-of-stock gaps. That part actually works.

What else works:

Competitor visibility. The model doesn't care whose product it is. It'll count your competitor's facings as easily as yours. For brand managers, this is gold. You finally get real numbers instead of a rep's guess.
Price tag reading. OCR on shelf strips is solid now. Not perfect, but solid. We see around 89% accuracy on printed tags, lower on handwritten ones (which, in Pakistan, is still a lot of them).
Promo execution checks. Did the trade marketing team actually put up the standee? Was the gondola end-cap installed? A photo plus AI gives you proof in under 10 seconds. No more "yes sir, it's there" when it isn't.
Speed. A rep used to spend 4-6 minutes manually counting and logging shelf data per outlet. With shelf photo recognition, it's down to about 40 seconds. Across 25 outlets a day, that's real time back.

This is the stuff I'm comfortable selling. It works. It's not magic, but it's a meaningful step up from clipboards and guesswork.

Where it still struggles (and will for a while)

Here's where I have to be straight with you, because I've seen too many vendors oversell this.

General trade in emerging markets is hard. Really hard. The shelves are crowded, the lighting is bad, products are stacked sideways, packaging gets faded from sun exposure, and sometimes there's literally a cat sleeping on the biscuits. (I'm not joking — we have a photo from Multan.)

A few specific failure modes I see weekly:

Similar SKU variants. A 250ml and 500ml bottle of the same brand, same color, same label design. Humans squint and figure it out. The model guesses. We've measured around 71% accuracy on close-variant differentiation in cluttered shelves. That's not good enough for sales reporting yet.

Partial occlusion. When 40% of a product is hidden behind another product or a price card, recognition drops fast. Some models will hallucinate a SKU that isn't there because the visible portion looks similar to a trained pattern.

New product launches. If you launched a new flavor last week, the model has never seen it. You need to retrain. With most platforms (ours included) that takes anywhere from a few days to two weeks depending on how many sample images you can provide. Not instant.

Photo quality variance. A rep with an iPhone 14 takes very different pictures than one with a 4-year-old Android. We've had to build aggressive image preprocessing just to deal with this. And even then, blurry photos at dusk in a poorly lit kiryana — we just reject those and ask for a retake.

Multi-language packaging. Urdu, Arabic, English on the same SKU, plus regional variants. Training data for English-only retail (the kind most global vendors built on) doesn't transfer cleanly. This is one reason FieldAssist or BeatRoute models trained on Indian data still need local tuning for Pakistan or UAE general trade.

How I'd actually use it in 2026

Look, if I were running a 200-rep FMCG sales team tomorrow, here's how I'd think about image recognition retail tools:

Use AI shelf analysis for the 70% it does well, and accept that the other 30% needs human review or a different method. Don't try to automate everything. That's how you end up with reports nobody trusts.

Start with one use case. Pick share of shelf, or pick planogram compliance, or pick competitor tracking. Just one. Get it working at 90%+ accuracy in your specific outlets. Then add the next.

Keep your reps in the loop. The photo isn't replacing them — it's giving them ammunition. When a rep walks into a store and the AI flags that the brand's facings dropped from 6 to 3 since last visit, that rep now has a specific conversation to have with the shopkeeper. That's useful. Replacing the rep with a camera is not.

And budget for retraining. New SKUs, new packaging, new markets — all of it requires the model to learn. If a vendor tells you their AI is "plug and play" for FMCG shelf audits across all your geographies, they're either lying or they haven't tried it in Sukkur yet.

We built the shelf analysis module in Zivni after about 14 months of getting it wrong in interesting ways. The first version recognized Coke as Pepsi roughly 1 in 8 times because our training set had too many low-light images. We fixed it. Then broke it again when we added Saudi market SKUs. Fixed that too. It's an ongoing thing, not a finished product.

Which is maybe the most honest thing I can tell you about AI in field sales right now — it's a tool that's getting better fast, but it still needs adults in the room. Anyone selling you full autonomy is selling you a 2028 pitch deck, not 2026 reality.

So before you sign that contract, ask the vendor to run a pilot in your three worst outlets. Not their three best demo stores. Yours. The real ones. With the cat.