1. Insights
  2. AI Data
  3. Case Study

Powering voice-assistant technology through multilingual datasets

Discover how we helped a leading global technology company train its virtual assistant to better understand and respond to user queries in 35 additional languages.

  • Share on Facebook
  • Share via email

4M
Audio prompts

5K
Contributors

35
Additional languages

50
Geographic regions

The challenge

Our client, a multinational technology company that focuses on ecommerce, cloud computing and a variety of hardware products, required vast amounts of data to train its interactive voice assistant.

The continual improvement of their assistant’s performance is critical to ensuring their user experience remains effortless. It’s also important for ongoing market share growth. To enhance their voice assistant’s abilities, the client needed a partner that could source multilingual audio data with speed, scale and specificities. Not only was the client looking to increase the number of languages the assistant could engage with, they also needed to expand its understanding of colloquialisms within each of those languages.

The client turned to TELUS Digital to leverage our proven ability to deliver high-quality data sourced from numerous locales at scale, all within a very tight timeframe.

The TELUS Digital solution

Ensuring that the data was collected from local, native speakers was crucial to provide the highest quality audio training data for the model. This required a unique and robust IP address tracking solution to validate users’ local addresses to maintain model integrity. It also meant sourcing candidates from across numerous geographies and time zones to record audio contributions.

Our AI Data Solutions experts were able to source and collect over 200,000 prompts in one calendar year via participants’ devices. Using a combination of our internal tools and our client’s tools, our team guided contributors to create a large variety of straightforward utterances in multiple languages, often using colloquial speech. The collected data was annotated against specific classification guidelines created jointly by our team and the client. These annotations were then localized into multiple languages.

Quality assurance (QA) is crucial in large collection projects like this. Given the size of the project, multiple vendors were involved; however, TELUS Digital played a lead role in the development of quality guidelines and in performing QA against other vendors’ work. Managing this process required highly effective project management and communication between partners.

The results

With TELUS Digital as its trusted data collection partner, the client was able to train its virtual assistant to better understand and respond to user queries in 35 additional languages. Over the span of five years, our team provided more than 4 million audio prompts and utterances from more than 5,000 contributors across 50 geographic regions, many of which were remote locations.


Check out our solutions

Test and improve your machine learning models via our global AI Community of 1 million+ annotators and linguists.

Learn more