For a full list of episodes, guests, and topics, check out our episode guide.
Go to GuideJoin us on the Alter Everything Podcast as we sit down with Olga Beregovaya, Vice President of AI at Smartling, to explore the evolving landscape of translation technology and AI model strategy. In this episode, Olga shares her 25+ years of experience in language technology, discusses the shift from rule-based to transformer models, and explains the importance of purpose-built AI models for translation.
Ep 196 AI Model Strategy
===
[00:00:00] Introduction and Guest Welcome
---
[00:00:00] Megan Bowers: Welcome to Alter Everything, a podcast about data science and analytics culture. I'm Megan Bowers, and today I am talking with Olga Beregovaya, Vice President of AI at Smartling. In this episode, we chat about translation technology, her approach to implementing AI models, and interesting takeaways from the translation industry in the age of AI.
Let's get started.
Hi, Olga. It's great to have you on our show today.
[00:00:36] Olga Beregovaya's Background in Language Technology
---
[00:00:36] Megan Bowers: Could you give a quick introduction to yourself for our listeners?
[00:00:39] Olga Beregovaya: Yeah. Hi, Megan. Thanks so much for having me. I run AI practice and all of our AI development and initiatives at Smartling, Smartling being an AI-first translation platform. Uh, me personally, I've been in the language technology industry for good over 25 years now. Actually, count college, it's going to be, it's going to be more than that. Pretty much haven't done anything but that, and obviously being the witness and the participant of the growth trajectory and progression in the language technology.
[00:01:11] Megan Bowers: Yeah. That's awesome. I'm excited to hear from you about just all of your experience.
[00:01:15] Transformative Shifts in Language Technology
---
[00:01:15] Megan Bowers: I think a good place to start would be hearing about some of the biggest transformative shifts you've seen in language technology over these past 20 years you've been in it.
[00:01:25] Olga Beregovaya: Language technology. I would imagine that language technology is in general evolving in a very similar fashion to the way overall machine learning and overall intelligence technology has been evolving. So it's pretty much very similar. If you think about it, we had rule-based tech, right? For years we had rule-based technologies such as rule-based machine translation and different rule-based libraries, and we had ontologies, but everything was pretty much hand-coded and hand-created. I mean, if you, if you were to create an ontology, or if you were to create a dictionary for machine translation 30 years ago, 25 years ago, it's basically involves manual coding, manually creating the parsers, or if you were running sentiment analysis, that would be manually training the engines. So I think the first transformative shift was probably going from more rule-based approaches to, at the time, statistical-based approaches, where language technology alongside, again, other areas of machine learning, actually learned how to obtain the data and how to learn from the corporate or from any kind of massive data. So I would say going from rule-based to statistical for language technology was a big shift, but nothing compared to what happened when transformer models were introduced. Obviously, when transformer models were introduced and when did that paper come out? "Attention is All You Need" was 2016. I would imagine that there was a massive, the biggest shift and the main breakthrough when it comes to language technology, when suddenly things that were not possible before became possible. And even if we look at early generation of transformer models, I was actually extremely impressed when I looked at, maybe it was not as much GPT-1, but when GPT-2 came out and you could actually see that you have give or take reasoning, give or take rationale behind language technology. So I would say rule-based to statistical was a leap, but a major or the biggest leap was actually going into transformer models and going into deep learning.
[00:03:24] Megan Bowers: Totally. That makes sense. It has been a huge shift.
[00:03:27] Vertical Approach to AI Models for Translation
---
[00:03:27] Megan Bowers: And I know when we talked earlier, you talked about having a vertical approach to AI models for translations. I was wondering if you could share a little bit more about, about what that means and how that might differ from just like general-purpose approaches. Mm-hmm.
[00:03:44] Olga Beregovaya: First of all, you totally can get translation from a generalized foundational model. There is no problem with them, and I think that's one of the bigger misconceptions of the modern time and day that you can actually just grab a generalized model and get the translation. That would totally meet your, I probably talk more about the enterprise goals, would meet your enterprise goals. Now, the reality is, if you think about a model that's built as a jack of all trades, and the model that can equally, I don't know, design your dinner menu, write code again with varying degree of success, and translate for you, obviously, what are you dealing with? You are dealing with a lot of data, trillions or terabytes and terabytes of data, and this data comes from all sorts of sources. So when it comes to one specific task, I think just common sense would suggest that if the model is trained on clean corpus and for a specific task, a purpose-built model, it just has very higher chance to provide much better output, whether it's for translation industry or whether it's for any other industry for that matter. And something that's built for purpose or something that can do everything under the sun. So I think that would be my first immediate reaction, although you can get fairly decent translation for your personal needs when it comes to general-purpose, generalized models. So that would be my first response. Second, if it comes specifically to translation, if it's a purpose-built model, the way you design the model and the data, the way you curate the data and the way you pick your data for translation would be dramatically different from the way you would actually approach it when just the model happens to be multilingual and it just happens to be able to perform translation alongside other tasks.
[00:05:27] Megan Bowers: Interesting. So then how many like different models do you use at Smartling for translation? What does that look like?
[00:05:37] Smartling's Model Portfolio and Translation Pipeline
---
[00:05:37] Olga Beregovaya: I think first things first, we still do not discard neural machine translation as one of the engines and one of the mechanisms to get reasonable translation. And again, if you think about it, you train something, scraping all of the internet and grabbing everything under the sun, and usually that would be monolingual. Whereas neural machine translation, although same goes for many purpose-built models, is trained on bitext, right? You have your source, you have your target, you have alignment. So that's why neural machine translation, as much as it has its constraints, you still get much higher predictability. When it comes to, when it comes to encoded, decode-based neural machine translation models, first, we do still use, it's a still, because we do understand that the shift is inevitable and at some point, neural machine translation, as we know it is probably going to be a thing of the past. But as of right now, we do use neural machine translation models that we customize and train with our customer's, with our, with our client's data or domain-specific data, and this is where we still get the most predictable result. So neural machine translation models, i.e. like your usual suspects, right? Amazon Translate, Google Translate, DeepL. They still have a place in our [pipeline] . Now, having said that, we do use various large language models for our for translation tasks, and we make sure that we remain agnostic, model agnostic because different models perform differently for different languages and different tasks. So at this point in time, our model portfolio includes all of the models that would be hosted in Google Vertex AI, all of the models that are hosted in Watsonx, models that are hosted in Bedrock, models that are hosted in, what would, what else would it be? Azure OpenAI. So basically all the models out there. We have a very robust mechanism of model benchmarking and what a model can do, and we make sure that we're really not only pick best in class, but build enough of our proprietary IP such as fine-tuning, such as RAG, to make sure that we extract the best from those models. So if I were to count them, our principle engineer who runs our R&D day to day, he actually has this project that he calls 40 Prompts Project, which basically means that at all, at any point in time, there are roughly 40 different prompts for different models that are used in different stages of our translation pipeline. Mm-hmm. Or global content generation pipeline. So that obviously means that, I don't know if you're familiar with the term Language Ops, LangOps, that now they live LangOps live alongside MLOps. Just have to do with multilingual content management production and it's not an easy task to run an ecosystem that can actually invoke in-house models, fine-tuned models, fine-tune generalized models, RAG, and manage prompts for all of those. So it is actually, it is a pretty sophisticated pipeline that we're very proud of alongside, again, very robust benchmarking mechanism.
[00:08:39] Megan Bowers: Wow. Yeah. Yeah. Very interesting.
[00:08:42] Olga Beregovaya: And also like when say Anthropic comes with the next version of Sonnet, because of the way our system is designed, it's fairly easy to swap in the latest version of a specific model when this version of a specific model actually makes more sense for a specific task.
[00:08:56] Megan Bowers: Gotcha. Yeah, it's super interesting. I had not heard of the Language Ops role, but it makes sense. If you have 40 prompts going throughout the pipeline, you need someone with a fine-tooth comb going through that and managing all of that.
[00:09:10] Olga Beregovaya: I think actually, honestly, again, if we go back to this misconception or conception of just grab LLM API and plug it into your ecosystem. Think about scaling, operationalizing it, how are you going to mitigate latency and all those things that often people do not think about when they think that just plugging in a model is, is a trivial task, let alone again, when it comes to fine-tuning, when it comes to designing the application layer and all of that. So yeah, but we remain model agnostic and I think we can, why not capitalize on R&D of a huge enterprise when we can and then we can just build and fine-tune and train around it.
[00:09:47] Megan Bowers: Yeah, definitely. Yeah.
[00:09:49] Data Curation and Quality in Translation
---
[00:09:49] Megan Bowers: I'm thinking about our audience on our show. We've talked a fair amount about generative AI and about the importance of the inputs and the quality input data, and so I'd love to hear more about how this might be unique for translation and what your data curation looks like for training these models.
[00:10:08] Olga Beregovaya: I think translation industry, we're absolutely blessed because all we've got at our disposal is human-curated, human-quality, quite often labeled data, right? If you think about the genesis of translation industry, and you asked me about translation technology, I should have taken a step back and spoken about things like Levenshtein distance, where translation memory came about, which is basically a way of storing and then calculating fuzzy and exact matches from human curated corpus. And then we go further into machine translation as a foundational step for translation. And then obviously human post-editing on top of machine translation. So we sit on a gold mine of curated, clean, domain-specific, vertical-specific, and quite often enterprise-specific data. So our data sets, and again, like when everybody, we go out there, we go to academic conferences, we speak to people from other industries, everybody's data-hungry, right? That's for instance. Yeah. Why? We see right now we see, what do we see? We saw that unsupervised learning is going to be, is going to be the silver bullet, and now you see data labeling companies again, being a huge. If you talk about companies like Scale AI or TELUS, and I don't know many other companies in that business, that data labeling and annotation has become the thing again. Now we in translation industry, inherently own it. We have it because the output of our process is publishable, human-quality, curated content. We didn't do anything to design those data sets. They just designed themselves because of the way translated data is stored in translation platforms like Smartling. It is always parallel. It always has some metadata to it. If nothing else, it would have domain, it would have date of creation. There was not as much now, but in the past there was an editorial layer to it. So one person translates, the other person corrects. What does it give you? It gives you the correction delta. So that alone, you already have translation at a distance there. And then furthermore, LQA, Language Quality Assurance, is a huge thing for us. So in for many companies to make sure that their deliverables meet their standards, they run LQA programs using different quality estimation metrics. More often than not, it would be Multidimensional Quality Metric or MQM. Now, what does it give you? Again, it gives you a data, data set labeled for quality. We're at that blessed space where our data sets are pre-curated for us now, obviously human error is human error. For that, there are curation techniques such as deduplication, removing things that look semantically, look suspicious, do not pass your semantic check. Semantics checks like when the target is suspiciously far away from the meaning terminology is probably equally important for our industry as for any other industry. So we have techniques to make sure that terminology is consistent with domain and consistent with the brand tone and voice. So I guess what I want to say, our data sets inherently are 70% there and 30% is curated through different error mitigation techniques.
[00:13:20] Megan Bowers: Gotcha. That's a nice place to be. I'm sure many listeners will be jealous when it comes to having just like really high-quality labeled data from the start.
[00:13:28] Addressing Model Bias and Translation Challenges
---
[00:13:28] Megan Bowers: Do you run into any issues with model bias or anything like that?
[00:13:33] Olga Beregovaya: Your data sets are good, but your data sets are not perfect. So a model can be culturally biased, right? A model can be gender biased. Also, if we look for instance, what did not quite matter 10 years ago, matters now. Things that are getting much more attention, such as gender disambiguation. There is a term, another term that I love, which is model toxicity or translation toxicity, when the model produces content that's potentially toxic. So there are a lot of things in the light of a lot of DEI initiatives and in the light of what matters and stop lists and things that. Do not call these days what you used to call it before. Obviously, we do run into that, like we have. One of our favorite, one of our favorite cases was that one of the government officials is a female, but no matter what we would do to the model, just because the model didn't know any better, it would continuously translate her as a male. Interesting into morphological language. Into a morphological language where gender is very explicit. So we obviously had to run some tweaks and make sure, and you cannot just say, no, she's always a female, because what if it's a sentence where one person is a male, the other one is a female? So there are things, obviously we do come across model biases quite a bit, but again, when we go back to purpose-built or fine-tuned models, quite often because our starting place is parallel. We don't have the issue that the foundational generalized models, which would have, where a lot of training data is disparate and monolingual. So the way, if it does it make sense? I hope it does. Right. Yeah.
[00:15:07] Megan Bowers: I was going to ask if you can break down that, that concept of the models being parallel just a little bit more.
[00:15:13] Olga Beregovaya: Oh yeah. What is translation? What is, and the concept of translation memory, which is basically previously created translation that can be reused inherently. It is a bitext. Inherently, it is a source and a target. So what it means, if you compare it to the way that generalized foundational models are trained, you have your source and you have your target aligned, usually in aligned sentences or on a substantial level. Or it could be aligned, could be aligned on a segment level, but regardless, you have them conceptually aligned. So the odds of it producing inaccurate, factually irrelevant or biased output are reduced just by the mere fact that you train on translation pairs. And we do know that most foundational models are, I [want] to be careful with percentages, but I think it's 60% English predominantly. The average number is 60%, like when you look at the GPT or Gemini or Anthropic, just because of the internet being predominantly English, right, and the sources being predominantly English. So that would be the main difference between fetching data from the translation universe and fetching data from all over the world. Just statistically, you get less data. Subsequently you introduce factual errors and subsequently you introduce potential biases. So I guess that's the main difference. Makes sense. That's super interesting.
[00:16:37] Purpose-Built Models in Various Industries
---
[00:16:37] Megan Bowers: How do you think that the kind of purpose-built model strategy could transform other industries or could apply outside of translation? Where do you see the purpose-built model strategy winning?
[00:16:51] Olga Beregovaya: So first let's see the places, multilingual areas where foundational models would shine. That would be general-domain. And general-domain, like for instance, opinion portals or retail user-generated content, which is inherently noisy. I can speak about everything and anything under the sun. If you look at Reddit, for instance, I would imagine that there is nothing better for Reddit than a generalized model because God knows what people will be talking about tomorrow. You cannot fine-tune for that. There is just no way you can do it. Or if you look at retail, for instance, especially when it comes to marketplaces, people-driven market, individuals-driven marketplaces, you cannot quite train for something that you don't know, right? You don't know what's coming tomorrow, you don't know. Or patents, for instance. Yeah, you can do things like structurally breaking down a patent into pieces and take different translation, different processing approaches for those. But at the end of the day, patents can deal with everything from Croc shoes to. I don't know, pick something. Uh, pharma. So there are areas that are so broad where foundational models will do just great, but I believe that there isn't an industry outside of more general content and outside of more general processes that would not benefit from purpose-built models. If you think about it, like you take construction, you immediately and actually have a friend who runs a startup. Uh, that has to do with it's process engineering, process engineers. So you need to be able to parse out diagrams, CAD drawings, flow charts, workflows, and everything under the sun. So if you have that data. And there is another parameter to it. If you have that data and actually you are able to train, train your models on that data, it's going to perform much better in your process-engineering space. Also, the way you would design, actually, you would design your model itself. You can make it smaller, you can make it faster. You do not deal with latency. So I'm just using process engineering as an example. But if you think about it, every industry would benefit from it, like life science as a the risk. I would actually, let me put it this way. The more risk-averse an industry or a vertical is the more it'll benefit from purpose-built models that are designed specifically for that. The model design itself actually caters specifically to that domain, and also the training data is very focused, curated, and narrow for that domain. So more general. Less space for less space for purpose-built models, more risk-averse and more focused, more domain-specific. More space for transformative space for, for purpose-built models.
[00:19:30] Megan Bowers: That makes a lot of sense. If you're an industry with, all industries have like their own unique formats, like the drawings or the language, and if you're risk-averse, you want to make sure all of that is, is handled super specifically, like up to your standards.
[00:19:47] Olga Beregovaya: Actually, there is another dimension to it. One, like I said, CAD drawings and then I thought about legal space, for instance, right? And legal space. We still see, especially in the e-discovery space, we see a lot of need for OCR. Reconstruction of something that could be a photograph, could have been handwritten. And just for entertainment's sake, I don't want to name any specific models publicly available models out there, but I would say grab any of them. Run a simple test, take a PDF, not possible a flat PDF. Feed it in. Try to get it understood and even less so try to get it re reconstructed. You will see that the foundational model that does not have it as a specific task is extremely unlikely. Not just to shine, but to deliver. So there is one other thing. It's not just the context. It is not just the model design. It's not just the training data, but it's also the ability to handle formats that are specific for that particular domain. And they're dramatically different. Life sciences formulas, right? Clinical studies reports. FinTech shareholder reports, they all come with their own can of worms that generalized models just cannot solve and fine-tuning or other techniques would not solve either.
[00:21:06] Megan Bowers: Yeah, one hundred percent. I mean, that does make me think even to an episode we did recently where we were talking about clinical trials and the way that some of them were planned and prepared in Word documents and just the unique challenges of copying and pasting these Word documents and building these trials. And they had some unique formats that they had input and just some unique challenges around that. That makes a lot of sense. It's really interesting to think about. Yeah. My next question is like shifting gears a little bit, but I'm curious.
[00:21:37] Building Trust in AI-Driven Translation
---
[00:21:37] Megan Bowers: A lot of times there can be pushback on introducing AI into like a traditionally human-centered industry. So, you know, you mentioned you have all of the, all of the data that was human translated, and that's great data to serve, but how do you build trust with your clients and manage that, moving from the human-centered into using AI for those processes?
[00:22:00] Olga Beregovaya: I think actually the challenge is complete opposite. How do you convince to, oh, okay. No,
[00:22:04] Megan Bowers: I think
[00:22:05] Olga Beregovaya: again, I mean there are pockets and we just spoke about life sciences, for instance, and regulated industries there. Indeed. Building trust is, that's a major, I wouldn't say challenge, but. It's a process, right? You need to run certain number of pilots. You need to deliver outcomes, satisfactory outcomes. You need to pick content types where you feel safe and where you don't introduce as much of a risk. So there will always be industries where you cannot just plug in AI without building up trust, through testing, through benchmarking, through piloting. But the problem that we're dealing with right now, and I don't think it's translation, I think it's the world, is this belief that, hey, just plug in AI there. I'm yet to meet a customer where there isn't an executive mandate to use AI for their processes and potentially it's equally innovation, cost cuts, ROI. There are multiple reasons why companies are mandated to use to implement AI, so I think it's actually the other way around. I think it's really going in showing that yes. AI is fantastic. It's going to take you this far, but you need human-in-the-loop, at least for, for a certain portion of scenarios. So I don't think we're selling AI versus human. It's more, Hey, let's remember that human-in-the-loop still has a role. So I think it's actually the other way around, but otherwise, maybe we can talk about the workforce and translators for a little bit. I mentioned translation memories, right? Which was basically a translator suddenly does not get paid for the whole new word. The translator gets paid only for a portion of it, because some of the content comes from translation memory, those massive pushback. 'Cause translators believed that like, Hey, you know, we're cutting into our wages. Then the second wave of pushback from translators was when machine translation was introduced into the workflows and there was a conversation about, Hey, it's cutting into our creativity again. We are making less money. We feel that we're just an appendix to a machine. So that was a pushback there, and then there was the third. I think now with AI, I think it's much less pushback, but I think then. Translators started discovering that, hey, I'm actually making as much money, if not more, because my productivity goes up so suddenly into my eight hours, I can fit much more content and much more work. So there were two processes of selling innovation. Let's go with AI-driven innovation. First you sell it to the workforce, and then once you are convinced that you can deliver at the same quality level, then you actually go, and then you build the trust with your enterprise customers. Trust both ways, right? AI only, or AI with human-in-the-loop.
[00:24:50] Staying Updated with New AI Models
---
[00:24:50] Megan Bowers: So I think a good place to, to end would be to talk about how you stay up to date when there are new AI models coming out constantly. And I'm sure like our audience can relate to that too. Like the second you've got one model in another, the next version comes out or the new one comes out. So how do you stay ahead in, in integrating new models?
[00:25:10] Olga Beregovaya: I think the real question is how do you still stay? Stay sane. First you stay and I'm not. I'm not sure for all I know, I could have crossed that line long ago. No, but it is true, right? You wake to a new model every day and I think. First of all, it is very important to follow what's happening out there. It is very important to be subscribed to newsletters and follow your LinkedIn feeds and follow the podcasts. How can you be in the know without being in the know? And again, that's where AI comes to your rescue. You can read summaries, you can read digests for what came out that day. So I think first of all, it's very important to see what's going on. Second, I think it's important to understand the patterns. And understand, for instance, like how much of a breakthrough was previous version with the next version for the next version to the next version. So you don't fall for the frenzy and you don't freak out because just a new model or a new approach or a new, I don't know, whatever open-source initiative came about. I think at some point you just build the understanding of what kind of a breakthrough you can possibly expect. Not run with the next shiny object, but say, Hey, we're at a good place right now. Let us see. Let us like, let's read the model card. Let's run a couple of experiments and see if it's even worth considering updating to or implementing, or is it not contributing much? Again, I don't want to throw any names out there, but recently there was a huge splash in the news about the latest model release and when we tested it for translation purposes, like yeah, you know, there is no massive breakthrough. This model was built for other reasons. So be in the know. I will go. Back to it. Model testing frameworks is an industry of its own. If we go back to Smartling, I think fast and robust way of testing, be it a model, be it a new corpus, be it whatever, the tweak to an existing model, or be it a new way of fine-tuning. As long as you have a good, healthy, automated framework in place, it's going to take you a day to know how much you can or cannot get out of this particular approach or this particular model. So. It's all not fall for the frenzy, not panic, not freak out, be appraised and have very healthy testing practices in place.
[00:27:22] Megan Bowers: Yeah. Yeah. And testing for your use cases, your industry maybe. Maybe it's a breakthrough for someone else, but it's not a breakthrough for you, so you don't get too caught up in the hype.
[00:27:31] Olga Beregovaya: Yeah. Yeah. Not getting, and I think that's also where we help our customers, our enterprise partners, because you do need, I mean, you can have. Somebody, an executive who says like, you know, I read a good thing about Google Gemini, or, I don't know, Claude 4.0, and this is, I think where companies like us, companies like Smartling would come in and like, Hey, we'll take on the burden of the testing for you. We'll actually tell you what's best or whether it could possibly make sense for you to build your own or some other approach.
[00:27:59] Megan Bowers: Makes sense.
[00:28:00] Conclusion and Farewell
---
[00:28:00] Megan Bowers: Well, yeah, thank you so much for coming on the show today and. For sharing your experience. It's been really interesting for me to learn more about the field of translation and just model best practices, so thank you. Okay.
[00:28:11] Olga Beregovaya: Thanks so much for having me.
[00:28:13] Megan Bowers: Thanks for listening. To learn more about topics mentioned in today's episode, head over to our show notes on alteryx.com/podcast. If you like this episode, leave us a review. See you next time.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.