Terminology-aware machine translation

Science should speak every language.

TaMTAS translates life-science texts with their terminology intact, then adapts them for the people who need to understand them.

The challenge

English should not decide who can create or understand science.

The deeper view

The project develops a terminology-aware, document-level translation and augmentation system for life sciences. It combines Large Reasoning Models, quality estimation, automatic post-editing and audience adaptation across English, Spanish, Catalan, Estonian and Irish.

How it works

From specialist language to shared knowledge.

The technical workflow follows the interdependence between work packages described in the project proposal.

  1. Corpus & terminology

    Build the scientific foundation

    Multilingual corpora and enriched terminology databases establish the terms the system must preserve.

    WP2 · TBXTools · parallel and comparable corpora

  2. Reason & verify

    Translate, detect and correct

    An LRM translates at document level. Quality estimation and automatic post-editing identify terminology errors and feed corrections back.

    WP3 + WP4 · LRM · QE · APE · DPO

  3. Adapt

    Make the result fit its reader

    The translated text can be simplified and supported with summaries, glossaries and explanations for different audiences.

    WP5 · text augmentation · audience adaptation

Ambiguity, made visible

A correct word can still be the wrong translation.

This conceptual demonstration shows why scientific translation needs document context and controlled terminology.

Source term culture

Generic translation

Selects the common meaning of the isolated word, without knowing whether the document concerns society or a laboratory sample.

Terminology-aware approach

Uses the life-sciences domain, surrounding sentences and a terminology database to resolve the intended concept consistently across the document.

Real-world validation

Three settings where terminology has consequences.

Stakeholders contribute authentic documents and expert or public feedback to evaluate accuracy, clarity, fluency and usefulness.

Ireland Irish
03

Conradh na Gaeilge

Accessible bilingual scientific and healthcare communication for patients, clinicians, researchers and the public.

5 project languages
36 months
Life sciences initial domain
TRL 5–6 target validation