Translator Copilot is Unbabel’s new AI assistant constructed immediately into our CAT device. It leverages giant language fashions (LLMs) and Unbabel’s proprietary High quality Estimation (QE) know-how to behave as a sensible second pair of eyes for each translation. From checking whether or not buyer directions are adopted to flagging potential errors in actual time, Translator Copilot strengthens the connection between prospects and translators, making certain translations will not be solely correct however absolutely aligned with expectations.
Why We Constructed Translator Copilot
Translators at Unbabel obtain directions in two methods:
- Normal directions outlined on the workflow stage (e.g., formality or formatting preferences)
- Venture-specific directions that apply to explicit recordsdata or content material (e.g., “Don’t translate model names”)

These seem within the CAT device and are important for sustaining accuracy and model consistency. However underneath tight deadlines or with advanced steerage, it’s doable for these directions to be missed.
That’s the place Translator Copilot is available in. It was created to shut that hole by offering computerized, real-time assist. It checks compliance with directions and flags any points because the translator works. Along with instruction checks, it additionally highlights grammar points, omissions, or incorrect terminology, all as a part of a seamless workflow.
How Translator Copilot Helps
The characteristic is designed to ship worth in three core areas:
- Improved compliance: Reduces danger of missed directions
- Greater translation high quality: Flags potential points early
- Decreased price and rework: Minimizes the necessity for guide revisions
Collectively, these advantages make Translator Copilot a necessary device for quality-conscious translation groups.
From Thought to Integration: How We Constructed It
We started in a managed playground atmosphere, testing whether or not LLMs might reliably assess instruction compliance utilizing assorted prompts and fashions. As soon as we recognized the best-performing setup, we built-in it into Polyglot, our inner translator platform.
However figuring out a working setup was simply the beginning. We ran additional evaluations to know how the answer carried out throughout the precise translator expertise, amassing suggestions and refining the characteristic earlier than full rollout.
From there, we introduced all the pieces collectively: LLM-based instruction checks and QE-powered error detection have been merged right into a single, unified expertise in our CAT device.
What Translators See
Translator Copilot analyzes every phase and makes use of visible cues (small coloured dots) to point points. Clicking on a flagged phase reveals two varieties of suggestions:
- AI Solutions: LLM-powered compliance checks that spotlight deviations from buyer directions
- Attainable Errors: Flagged by QE fashions, together with grammar points, mistranslations, or omissions

To assist translator workflows and guarantee clean adoption, we added a number of usability options:
- One-click acceptance of strategies
- Skill to report false positives or incorrect strategies
- Fast navigation between flagged segments
- Finish-of-task suggestions assortment to assemble person insights
The Technical Challenges We Solved
Bringing Translator Copilot to life concerned fixing a number of powerful challenges:
Low preliminary success fee: In early checks, the LLM appropriately recognized instruction compliance solely 30% of the time. Via intensive immediate engineering and supplier experimentation, we raised that to 78% earlier than full rollout.
HTML formatting: Translator directions are written in HTML for readability. However this launched a brand new situation, HTML degraded LLM efficiency. We resolved this by stripping HTML earlier than sending directions to the mannequin, which required cautious immediate design to protect that means and construction.
Glossary alignment: One other early problem was that some mannequin strategies contradicted buyer glossaries. To repair this, we refined prompts to include glossary context, decreasing conflicts and boosting belief in AI strategies.
How We Measure Success
To judge Translator Copilot’s affect, we carried out a number of metrics:
- Error delta: Evaluating the variety of points flagged initially vs. the tip of every job. A optimistic error discount fee signifies that the translators are utilizing Copilot to enhance high quality.

- AI strategies versus Attainable Errors: AI Solutions led to a 66% error discount fee, versus 57% for Attainable Errors alone.

- Consumer habits: In 60% of duties, the variety of flagged points decreased. In 15%, there was no change, doubtless instances the place strategies have been ignored. We additionally monitor suggestion reviews to enhance mannequin habits.
An attention-grabbing perception emerged from our knowledge: LLM efficiency varies by language pair. For instance, error reporting is increased in German-English, Portuguese-Italian and Portuguese-German, and decrease in english supply language pairs akin to English-Spanish or English-Norwegian, an space we’re persevering with to research.

Trying Forward
Translator Copilot is a giant step ahead in combining GenAI and linguist workflows. It brings instruction compliance, error detection, and person suggestions into one cohesive expertise. Most significantly, it helps translators ship higher outcomes, sooner.
We’re excited by the early outcomes, and much more enthusiastic about what’s subsequent! That is just the start.