Will a Voice send Bawarchi and Butler running for cover? This Bunny with smart skills has all the answers

Last Updated: Mon, Jul 16, 2018 19:06 hrs
Rajesh Khanna, Alfred, Jarvis

#TheBestAssistantEver: Alfred (left), Batman's loyal aide. Bawarchi (right) with his cool demeanor and quips. or Iron-Man's Jarvis.

"It is so simple to be happy, but so difficult to be simple," says Raghu, a learned Bawarchi cum conman from Hrishikeshda's classic 1972 movie. Forty five years later, this Bawarchi who learnt much by his experiences seems to be threatened. Meanwhile another assistant, who's been the trusted confidante to Gotham city's savior Batman may also be facing a similar threat.

Rising use of machines may not lead to an apocalypse, anytime soon; Nothing has been officially reported yet. But the rise of Voice based technologies is sure to add competition to Butler and Bawarchi.

In fact any growth in furthering technology on the voice interface can make the idea of hiring a human-servant a redundant thing. From checking your bank accounts, reading a book, setting a reminder, and helping you with recipes, Voice based technologies can achieve all without an error.

Unless you miss Bawarchi's dimpled chins, the winks, nods of the head, there seems no reason to hire him. And that probably remains the same for Butler.

Voice is the "in-thing". And in the words of Rajesh Khanna's Raghu, this technology may look deceptively simple, but there is an invisible difficulty in maintaining its simplicity.

Our host Sreeraman Thiagarajan, the Co-founder at Agrahyah Technologies, a startup bullish on Voice based technologies, helps us decode Voice based technologies. The company's co-founder Rushabh Vasa adds to this conversation with interesting tidbits on security and how Voice as an assistance took over.

Agrahyah is a company that develops skills across platforms such as Alexa and Google Assistant.

In a conversation with Sairaj Iyer, the duo share some of their business plans, quirky ideas such as getting an Almanac and a Calculator in Hindi, and what Voice as a technological assistance may look like in 2020. The duo also explain how their company, a bunny, feels being in an eco-system that has large players- the rhinoceros and Dinosaur.

Here are excerpts from the interaction:

Share us a brief of what is Agrahyah, and what do you do?

We are a technology company that offers voice based solutions. For an elite premium bank Agrahyah can be a concierge at the home of premium customer's home- like his 24X7 relationship manager. For the mass banking customer, we are the easy helping hand, a voice-command that can help him with finding his balance or transferring money.

For an insurer, we are a customer's easiest way to calculate premium, remind them about renewal dates, find the Insured Declared Value of a car, or even share various plans. Then, we are also building voice capabilities for a company that has created 800 rhymes.

We have been building notable use-cases across healthcare, retail and various sectors.

Agrahyah's founders Sreeraman Thiagarajan(extreme right), Uppal Shah (center) and Rushabh Vasa at a Web Summit in Portugal.

Can you detail some of your products?

We have built a handful of unique skills that enables users to communicate with a device using voice as an interface. The most useful skill we have built is called- better heart. It works in English and Hinglish. The skill, available on Alexa, explains the users, symptoms of heart-attack, myths, and better eating habits. In fact, everything related to better cardio health. We have skills called a running coach formulated by a iron-man. Then there is one called Bollywood Trivia (English) and Bollywood Guru (Hindi), which has more than 1000 trivia across 100 celebs.

Besides these we have an in-house content portal called Hotfridaytalks.com with categories in travel, business and lifestyle in Hindi and English. Then an Indian Calculator that works in 11 languages, including Urdu. So far, we have launched 5 more voice based skill-sets that we launched. Our latest product 'India Panchang' is available on Alexa.

Why a Panchang (Almanac) and Calculator? Did you see any compelling use-case? We wanted people to use our products in their daily lives. Moreover, we obviously can't wait for people to have smart homes, Smart LED, Smart motors, Smart kichens etc and then think of bringing our skills.

Is Voice a marketing fad?

  • Improvements in Speech recognition technology will enable smart vehicles, homes, and wearable devices.
  • Chinese market has been pegged to reach $19.17 billion according while India's domestic market as on date has been pegged at Rs 100 crores.

Voice has been touted as the in-thing. Is this a marketing fad? 30 years ago, one typed lines of code to even draw a circle. Then came the mouse. Then, it was touch screen devices. From writing a few lines to touching the software, and now speaking with an interface, this is the evolution of communicating with computers.

Voice is not a fad. This is not a wearable device. This is not a fancy thing. Voice is the way humans are interacting with computers now. The things that you could touch and type can now be done via a voice command.

From Reliance to Google Home, everybody is busy marketing some use-case. Why that sudden interest now?

Remember Stephen Hawking and that voice powered machine on his chair? He had been using it for ages. Voice technology, in fact has been older than even Hawking. What changed now is the computing power. And also capabilities in the areas of artificial intelligence and machine learning. Devices today can compute contextually thanks to AI and ML.

For a simple question such as "Do I need an Umbrella?" Google and Alexa contextually connect that question to the weather. An umbrella is needed for rains, and maybe during the sunny weather. There is a never mention of the word 'rain' in the question. The intention behind this question is known only to humans, and AI and ML are today empowering it to be used in smart voice devices.

Amazon's Alexa, Google's Voice, Apple's Siri, Microsoft's Cortana are popular names. But expect more to come up in 2020.

How do you look at the eco-system today, considering this is still a growing market and who do you think could take the crown in 2020?

In terms of devices and voices you have the Amazon Alexa, Google's Assistant, Samsung and many more entrants. Apple's Siri came in 2009-10, but she was locked inside a room called iPhone, belonging only to iPhone users. The beauty of Google and Alexa is that they are out in the open. All the top brands are toying something with Voice. You can book a cab, a movie ticket. In fact home builders are integrating smart devices in their flats. Bengaluru based Embassy homes announced that they are building 800 homes with Alexa pre-loaded. Imagine you move into the house and there is this 24X7 a virtual assistant like Jarvis from Iron Man, ready to take your commands.

It won't be one single person with the crown. My guess and expectation is for a combination driven by inter-operability. For example, if you are using Microsoft Cortana, you could operate the Alexa interface.

This inter-operability is similar to mobile roaming. That ways, consumers have the choice.

To talk of innovation, Sundar Pichai has confirmed that Google's Duplex will be the future at the recently concluded i/o summit. Google's Duplex can call a restaurant, have a conversation either with a human or another trained bot to book a table for two.

Also Read:

An AI that can run a call center!

Microsoft to build its own human-sounding AI to counter Google

An indication of where the market could head by 2020

  • Another study expects competition between the likes of Siri, Cortana, Alexa and Google Assistant to heat-up considering the market is poised to grow by CAGR of nearly 20% between 2017 and 2023.
  • In 2017, Voice and speech recognition market in Asia Pacific itself was close to $400 million.
  • By 2020 50% searches will be voice-based

Where do you see voice in the next three years?

It would be quite a matured and saturated market.

Why saturated?

Every new technology takes half the time of its predecessors to reach a billion users. TV took 50 years, landline phone about 22 years, Internet and smartphone about 14 years each, and Facebook close to 7 years. Voice may require only half of that- 3.5 or 4 years.

Sreeraman with his team. He says hiring members who are smarter than the founders has been one of the challenging aspects of his business.

What prompted you to start Agrahyah?

Rushabh, Uppal (Co-Founders) and myself came from service background. And we wanted to start a product company that was scalable, meaningful, impactful, and were looking for a cause. We figured that India as a country had over 300 million internet users. That, for a country with 110 million English speakers until 2016. So we saw internet users growing but not English speakers.

If the world-wide-web was a 100 page book, 51 pages are in English, 7 each in German, Russian and French. But, Hindi content in the world-wide-web is a measly 0.5%. Surprisingly for a population of 420 million speakers, the world-wide-web as a book could only display half a page.

We wanted to make internet meaningful to those who had access. For a large Hindi speaking population, the internet is entertainment. They would say its the pocket size 'Tv'. The world's largest web companies are offering it free. And they are putting Wi-Fi at the railway stations. That is creating accessibility. But is it relevant? Is it useful?

We said, let's do it. So what if we can't boil the ocean, and solve a problem overnight, we believe even if we did a fair chunk of it over the next 15-20 years, we may have created some impact, and some value in the process for ourselves too.

Voice came in around November 2016. I am listed as a Google Developer Expert for Marketing. This gave me access to some of Google's early-stage technologies. Thereby, I had access to Google Home, as early as November 2016. After initially toying with Google Voice, we realized that people who don't know English, are not going to back to school, come back and then start using the internet.

But even if you happen to have some Hinglish content, it could work. For example, 'train time bolo', could help the device pick up the keyword and share train time. So, we not only started with voice, but also with vernacularization and localization of voice. By the time Alexa came last November and Google in April, we got a head-start.

A detailed study

PwC's 2018 study of 1000 Americans aged between 18-64: .
  • 10% surveyed respondents were unfamiliar with Voice, but 72% used a Voice assistant.
  • 3 out of 4 consumers use a mobile Voice assistant.
Voice Assistants could be used for the following:
  • 1. Normal web searching
  • A quick question
  • Check Weather/news
  • Play Music
  • Set timers/reminders
  • Send text/mail
  • check traffic/navigation
  • Add items to shopping list
  • Buy or order something
  • Control other smart devices

Is this turning out to be a profitable venture, considering reports that Alexa coders may not be earning much?

There are thousands of developers on the Alexa platform. And you need to differentiate them into hobbyist, enthusiast and a professional. We are among the third category- Professionals making a living out of developing skills. In fact, we are one of three empanelled recommended agencies with Amazon India. The first is TCS, the second is mPhasis and the third is Agrahyah Technologies.

If you compare the three on size, you will end up with Dinosaur, Rhinoceros, and Rabbit. And this is just on the number of heads employed by each. All of us are professionals nonetheless.

Coding is a part of any development. But that alone is not sufficient for developing Voice skills.

You need skills in imagination, visualisation, understanding the customer needs. Development for Alexa is about 60% VUI (Voice-UI). A single developer sitting in a coffee shop can surely build a skill, but that is a 1% remote probability. To develop a skill you need to know VUI, UX, content, coding, and also produce something so awesome that it can click using only Voice

The Amazon ecosystem is great to have, but remember there is differentiation too. Amazon's Hackathons have had some superb ideas built by one person teams. But out of 10,000 developers its hardly 10 people who could reach that stage.

As professionals, getting the right VUI is key. There is no back-button, no refresh button and no feedback too. Anything can happen, and we always have to ensure that the user is in control and not the bot. It therefore takes time to set the design right.

So what revenues does the rabbit generate?

We get to work with some of the greatest brands, and enterprises who are happy to work with us. Revenues are sufficient, and it helps brands work with a rabbit, because we are agile and nimble, and can do things really quickly.

Speaking of coding and developing skills, when do you think can we have a completely Indian language ready Voice tool?

All those software languages are pretty much the same. The commands remain the same formulaic. There is no option to code in vernacular language. It's like the printing press, you don't need an Urdu printer, just sending the print command can help print across a number of languages.

As far as MI and AL are concerned, you can consider it like a child being born and brought up in China. This child doesn't understand anything but Mandarin. No way can this kid reply in Gujarati. Similarly, all machine languages are trained in English.

Thankfully we have a Hinglish tool at the moment. There has been talk of a completely Hindi engine. But we have no clue when we could have a fully functional engine for Hindi.

Speaking of voice, it becomes pertinent to ask about the angle of security and privacy. Since it easier mimicking somebody's voice rather than a finger-print, how is security being handled by some players like Google or Amazon?

There are risks with any technology, be it rocket science to a simple messaging app. At the moment, nobody uses voice as the only security parameter. Voice is always completed with a combination of password.

Big players like Amazon and Google have their own set of security protocols. Google has recently launched OAUTH 3, a state of the art protocol for login.

OAUTH 2 has been around for long time. The big players are setting it as industry standard. In fact a sort of compliance, like we have PCI- compliance for banking apps.