Introduction
In recent decades, remarkable advancements in technology have transformed the way we live, communicate, and work. Consequently, all aspects of translation have been profoundly impacted, resulting in significant changes. For example, the emergence of new professional roles driven by the rise of machine or computer-assisted translation (CAT) tools, the growth of large-scale collaborative non-professional translations, and innovations in translation theory stemming from technological breakthroughs.
Furthermore, over the years, the demand for translation has increased as we live in a world where communication is becoming more multimodal and multilingual. Just as the Internet was beginning to spread widely, scholars agreed that the main goal of translation is to convert a text originally written in one language into its equivalent in another language, in a way that preserves the original’s meaning, formal features, and functional roles (cf. Bell, 1991).
The advent and development of new technologies such as Neural Machine Translation (NMT), generative AI, and Large Language Models (LLMs) could facilitate the work of linguists. These technologies offer significant support with their ability to simulate natural language—even when transmitting complex messages—and adapt texts to the target culture. After all, as early as 1969, Nida and Taber stated that translation is the reproduction of the closest natural equivalence of the source language in the target language.
This led translators to learn how to collaborate with such digital tools, as they can provide a significant advantage in their work, especially regarding time and collaboration. Quoting Valentina Piotto and Giuseppe Sofo (2023: 236):
L’aspetto particolarmente interessante di queste pratiche è che si basano su una concezione completamente diversa della tecnologia digitale nella traduzione; lungi dall’essere la minaccia “disumanizzante” che viene solitamente percepita, l’apporto tecnologico alla traduzione è visto in questo caso come uno “strumento di convivialità e uno strumento di intervento politico umano”, basato su “un allontanamento dal soggetto monadico della traduzione tradizionale nella direzione di una plurisoggettività dell’interazione” [Cronin, 2013: 102].
The most interesting aspect of these practices is that they are based on a completely different view of digital technology in translation. Instead of being seen as the "dehumanising" threat it is often perceived to be, technology’s role in translation is regarded as “a tool of conviviality and an instrument of human political intervention”, grounded in “a move away from the monadic subject of traditional translation to a plurisubjectivity of interaction” [Cronin, 2013: 102].
As Piquet (2009) already stated, the concepts and tools related to collaborative work are not new. However, over time, they have gained a whole new dimension due to the democratisation of information and communication technologies (ICT) in society and, consequently, in our organisations. When discussing collaborative work, it is essential to consider the substantial technological dimension and the fact that all available collaborative work tools are evolving rapidly in response to market and user needs. They are maturing from technical, economic, and social perspectives, becoming more user-friendly, easier to install, and more cost-effective in terms of software and performance (Laurenti & Villareale, 2023).
There are many useful tools available online for collaborative tasks at present, each with specific technical features. A workgroup planning to use them must consider several factors to ensure the chosen tool is effective and suitable for achieving the goal and does not disrupt the workflow.
Selecting the right tool for a specific task is essential in the collaborative process, as not all technologies suit every situation. Making an informed decision can significantly enhance the quality of the work. However, simply choosing the correct tools is not enough to guarantee that the task is performed correctly. It is crucial that all participants genuinely understand what it means to work collaboratively and are fully committed to this approach, rather than trying to stand out. In collaborative work, individual subjectivities merge to form a unified group identity.
People working on a collaborative translation task using collaborative tools, like the several machine translation tools available on the market, must remember that all these are imperfect technologies (O’Brien, 2022: 105); therefore, the risk of misunderstandings in written translation remains high, especially because people involved in the communication cannot use their body language to help the translator exactly understand the meaning of their message. Consequently, a thorough revision of the translated text continues to be essential.
For this reason, the study, carried out by the International Center for Research on Collaborative Translation1 (hereinafter the Center) at the IULM University in Milan, investigates how two groups of students collaborated with each other and with different digital tools to understand the benefits that translators can derive from using traditional computer-assisted translation (CAT) tools, machine translation, and LLMs, and the most effective way to organise collaborative translation work. It also examines the most common challenges associated with using these tools, providing an initial guide for professionals and semi-professionals on how to approach translation tasks with them.
Methodology
For this research, the Center collaborated with the EMT-DGT (European Master’s in Translation – Directorate-General for Translation of the European Commission, Brussels)2 and the Laboratorio di Redattologia e Traduttologia3 of the University of Udine. The 20 texts which the 10 students from the IULM University worked on were sourced from the EMT blog, and the Italian translations were then published there. All the texts concerned translation experiences; the style and language were informal, as they are typical of informative blog articles. Two students from the University of Udine helped IULM students during the final revision step (12 students participated in the experiment).
Groups and work organisation
This study aimed to evaluate whether using NMT or LLM tools for translation is worthwhile compared to traditional CAT tools, considering that they tend to produce less accurate output requiring closer revision. Consequently, it was essential to assess the quality of the translation relative to the total time spent on the task.
Given the numerous NMT and LLM tools available online, it was necessary to select the most suitable one for this particular translation task. Therefore, the work began by first determining how to assess the tools’ performance and selecting the appropriate metric to use (see Section “Choosing the best AI-based tool for translation”).
Once this evaluation was completed, the researchers proceeded to divide the students into two groups of six people each (5 translators/reviewers, 1 editor): one group translating using a free collaborative CAT tool (Smartcat) and the other post-editing the raw output generated by the selected digital tool.
Students collaborated to produce a translation ready to be published on the EMT blog, and the workflow was organised as follows (please note that, when talking about “translation”, we refer to the entire workflow a translator follows before delivering their translation to the reviewer, including the self-revision of their own work. In this case, the step of self-revision included more than one translator, as it was a collaborative project):
Step 1: Translation
|
Group that worked with CAT-tool (Group A) |
Group that worked on AI output (Group B) |
|
|
Step 2: Revision/editing
- In Drive, the two groups collaboratively revised the texts they did not initially translate, creating a copy of the file directly in the Drive folder.
- When the reviewers finished, translators accepted/rejected the changes made.
- When everyone agreed, they uploaded the final revised texts to a Drive folder shared with all students and coordinators.
- The students of the University of Udine revised all the texts one more time, keeping in touch with the IULM work group.
- Texts were ready for publication.
Appendix 1 shows precisely how each group proceeded with its work at every step of the workflow.
A qualitative analysis of the texts’ readability was then carried out, using comparison tables to examine more closely the changes made at each stage of the workflow.
Finally, researchers also used students’ reports to analyse the time spent on each stage of the translation process.
Data collection
As regards the calculation of timing, a proportion was adopted to adjust the results to the sample of characters considered, as the two groups worked on 75,336 and 83,510 characters, respectively. The sum of the total time that each subject spent in each phase of the work to translate and review the total number of characters was then proportioned to the number of characters in the portion of text examined. To collect significant data and fairly compare working times, researchers decided to analyse the translation of a text sample of 5,232 characters for each tool, so that both texts were of the same length (e.g., Total characters = 83,510, Text extract sample characters = 5,232, Total time taken by all participants to translate Total characters = 710 min, total time taken by all participants to translate Text extract characters = 44 min [proportion result 5,232 ÷ 83,510 = x ÷ 710]. That is why the result obtained is not multiplied by the number of participants (five per group), because the number already accounts for the total time.
To view all the data and verify the proportions used in each phase, refer to Appendix 1.
Study
Choosing the best AI-based tool for translation
The initial step involved assessing which AI-based tool should be used to generate the raw output for post-editing. Researchers analysed the translation of the same text provided by six of the most widely used online tools: Google Translate, Systran, Yandex, DeepL, Microsoft Translator (as NMT), and ChatGPT (as LLM).
Researchers evaluated the translation from English into Italian of the article My Distance Learning,4 sourced from the EMT blog, using a manual, quantitative metric called SAE J2450.
This metric measures common errors made during translation, regardless of the source or target language, and whether humans or machines perform the translation. Developed by the Society of Automotive Engineers, the metric is described as “a score sheet that enables evaluators to capture error types and quantities of translation errors” (Woyde, 2001: 38). It provides a translation error scoring system (Translation Quality Score, or TQS) to assess translation quality. It includes seven main error categories (i.e., wrong terms, syntactic errors, omissions/additions, word structure and agreement errors, misspelling, punctuation errors, miscellaneous) and two error severity levels (minor or serious). When assessing translation quality, each error identified by the evaluator is classified into one of the seven categories; after determining its primary category, the evaluator decides whether it is a serious or minor error based on its severity. Both classification levels are subjective judgments by the evaluator. Once the primary and secondary categories are set, each error is weighted and summed. This total is then divided by the number of words in the translated text to produce the translation quality score (Woyde, 2001). A lower score indicates higher translation quality.
Table 1. TQS of different tools
|
Google Translate |
Systran |
Yandex |
Microsoft |
DeepL |
ChatGPT |
|||||||
|
Serious |
Minor |
Serious |
Minor |
Serious |
Minor |
Serious |
Minor |
Serious |
Minor |
Serious |
Minor |
|
|
Wrong terms |
2 |
1 |
4 |
4 |
2 |
1 |
3 |
1 |
||||
|
Syntactic errors |
4 |
3 |
1 |
7 |
1 |
3 |
1 |
1 |
2 |
|||
|
Omissions |
1 |
|||||||||||
|
Word structure and agreement errors |
1 |
1 |
1 |
1 |
1 |
1 |
||||||
|
Misspellings |
||||||||||||
|
Punctuation errors |
||||||||||||
|
Miscellaneous |
1 |
|||||||||||
|
TOTAL |
14 |
20 |
24 |
14 |
10 |
3 |
||||||
|
Number of words |
799 |
787 |
805 |
808 |
794 |
756 |
||||||
|
TQS |
1.75 |
2.54 |
2.98 |
1.73 |
1.26 |
0.40 |
||||||
As shown in the table, ChatGPT scored higher, while DeepL performed better compared to the other NMTs. However, as these two types of technologies (NMT and LLMs) operate in very different ways, their translation outputs can differ considerably, and it is essential to consider the benefits and limitations that might arise from using each tool.
Machine translation, which utilises various algorithms, patterns, and large databases of existing translations, takes a source text, divides it into words and phrases—segments—and substitutes these with corresponding words and phrases in another language (the target) (Smartcat, 2022). In other words, MT is explicitly made for translating written texts.
Conversely, Large Language Models use deep learning techniques to understand, summarise, generate, and predict new content. Once trained, an LLM provides a basis for AI to be used for various practical purposes (Stryker, 2023). One such purpose is to generate and translate texts by taking an input prompt and leveraging the learned knowledge to predict the next word or phrase. The model iteratively produces the output, considering the context of the input and its previous predictions. Therefore, LLMs were not created solely for translation, but they can perform this task.
Considering all this, after evaluating the pros and cons of each tool, it was decided to utilise ChatGPT as the foundation for the translation since it has no typing restrictions, can be asked to edit sentences endlessly, and exhibits superior rephrasing abilities. Nonetheless, one must remain vigilant with the source text because, when translating from English into Italian, rephrasing can differ significantly from the original, altering the meaning of the translation; therefore, thorough post-editing is always necessary.
Translating with a CAT tool: Smartcat
To compare the benefits of collaborative translation work performed using a CAT tool versus an LLM, a decision was made to evaluate the texts produced based on changes made at each translation step and to analyse the time required by the translators.
To collect significant data and fairly compare working times, researchers decided to analyse the translation of a text sample of 5,232 characters (length of the source text translated with Smartcat: “Innovating for accessibility: Sign language at the University of Geneva’s FTI”)5 for each tool, so that both texts were of the same length. Researchers then compiled a table highlighting the steps of the working process, the changes made to the text during each of these steps, and, finally, the type of changes made, to understand the problems encountered that led to a variation in the following step.
Table 2. Workflow, time, and changes – Smartcat
|
Step 1: Original text > first translation |
Step 2: First translation > internal revision within the Group A (A1 on A1, A2 on A2, A1 on A2, A2 on A1) |
Step 3: Internal revision > revision outside Group A (B on A, A on REV(B)) |
Step 4: Editing of texts translated by Group A |
Total time spent |
|
|
Time spent |
86 min. |
54 min. |
57 min. |
46 min. |
243 min. 4 h |
|
Most significant changes |
✓ |
✓ |
|||
|
Rendering in Italian |
|||||
See the appendix for the detailed timing by translator in total and proportioned to the number of characters in the portion of text examined.
As expected, the first translation produced on Smartcat took the participants about 1.5 hours, as they had to create a new text in another language from scratch. Although the first translation was already of medium-high quality, the revision steps took 2.6 hours, bringing the total effort to 4 hours. Most changes occurred during the external revision phase, which was carried out by other translators, and in the final editing process before publication. These two tasks required a total of 1.7 hours and mainly involved the Italian rendering.
The following table displays some of the final changes made to the first translation, focusing on improving fluency in Italian and selecting terms more appropriate for the context and target language.
Table 3. Examples of changes
|
Error type |
Original text |
Step 1 |
Step 2 |
Step 3 |
Step 4 |
|
Rendering in Italian |
Accessibility is about making sure that people with disabilities and/or with special needs have access to society on an equal basis. |
Accessibilità significa assicurare alle subjecte con disabilità e/o con speciali esigenze di avere accesso alla collettività in maniera equa. |
- |
Accessibilità significa assicurare alle subjecte con disabilità e/o con bisogni speciali di avere accesso alla società in maniera equa. |
- |
|
We had two main goals in setting up this programme: improving the inclusion of Deaf people by making the workplace more accessible, and making information more accessible to a wider public by training communication |
Avevamo due obiettivi principali quando abbiamo messo a punto questo programma: da una parte l’inclusione delle persone sorde nei luoghi di lavoro rendendoli più accessibili, dall’altra parte far sì che le informazioni raggiungano un pubblico più ampio |
- |
Avevamo due obiettivi principali quando abbiamo messo a punto questo programma: da una parte l’inclusione delle persone sorde nei luoghi di lavoro rendendoli più accessibili, dall’altra far sì che le informazioni raggiungano un pubblico più ampio |
Due erano gli obiettivi principali quando è stato messo a punto il programma: da una parte includere le persone sorde nei luoghi di lavoro rendendoli più accessibili, dall’altra convogliare le informazioni verso un pubblico più ampio |
Translating with ChatGPT
The same type of analysis was conducted on the edited text initially translated by ChatGPT. In this case, translators were asked to work on another text, always sourced from the EMT blog (‘Can you see me? Can you hear me?’ New teaching and learning environments and the new ‘normal’)6; researchers then took into consideration for their analysis the first part of the text (5,232 characters), to correctly compare working times and data.
Table 4. Workflow, time, and changes – ChatGPT
|
Step 1: Original text > raw output |
Step 2: Raw output > First individual MTPE of the translation |
Step 3: Collaborative revision (video call on Microsoft Teams) |
Step 4: Collaborative revision > revision outside Group B (A on B, B on REV(A)) |
Step 5: Editing of texts translated by Group B |
Total time spent |
|
|
Time spent |
3 min. |
81 min. |
55 min. |
52 min. |
34 min. |
224 min. 3.75 hours |
|
Most significant changes |
✓ |
✓ |
||||
|
Calques Grammar errors Comprehension |
Rendering in Italian Gender neutrality |
See the appendix for the detailed timing by translator in total and proportioned to the number of characters in the portion of text examined.
Again, as expected, although the translation time is very short – only 3 minutes, the revision time has increased significantly to 3.75 hours. However, in such cases, only the post-editing and revision time should be considered for the workload, as the translation was generated very quickly by the tool.
Most of the editing was done during the initial post-editing of the raw text to address calques, grammatical errors, and comprehension issues. It is the most important part of the workflow: the translator works on a translation that, even though it is already good and understandable, must be carefully examined, thinking about every word and sentence, trying to render the text as suitable as possible for their target market. The changes made during this step not only correct calques and grammatical or comprehension errors, but are also intended to make the Italian text as fluid and comprehensible as possible. However, an external revision is still needed, since a different subject, reading the translated text for the first time and comparing it with the original text, may notice errors or inaccuracies that escaped the translator’s eye, who is accustomed to their own translation and understanding. That is why we had Steps 4 and 5.
In fact, the editor working in the final phase aimed to improve the Italian translation and, particularly, to address gender-neutrality issues—a common problem in Italian when a text is directed towards an unspecified audience. The machine tends to translate all adjectives and nouns referring to people of all genders with the overused masculine form in Italian. Still, for inclusivity reasons, it is always better to use gender-neutral words in Italian.
Table 5. Examples of changes
|
Error type |
Original text |
Step 1 |
Step 2 |
Step 3 |
Step 4 |
Step 5 |
|
Gender neutrality |
but the key is to be adaptive, flexible and patient. |
ma la chiave è essere adattabili, flessibili e pazienti. |
- |
- |
- |
ma la chiave è sapersi adattare con flessibilità e pazienza. |
|
Casts/Comprehension |
Teaching distance learning works in the scenarios already mentioned and it is a solution whilst we minimise the risk of infection on campus and wait for the pandemic to abate. |
L’insegnamento a distanza funziona negli scenari già menzionati ed è una soluzione mentre cerchiamo di minimizzare il rischio di infezione in campus e attendiamo che la pandemia diminuisca. |
L’insegnamento a distanza funziona negli scenari già menzionati ed è una soluzione mentre riduciamo al minimo il rischio di infezione in presenza e attendiamo che la pandemia si attenui. |
- |
- |
L’insegnamento a distanza funziona negli scenari già menzionati ed è una soluzione per ridurre al minimo il rischio di contagio in presenza mentre attendiamo che la pandemia si attenui. |
|
exercise lessons |
lezioni di esercizio |
lezioni di ginnastica |
- |
- |
- |
|
|
Rendering in Italian |
Can you see me? Can you hear me? |
‘Riesci a vedermi? Riesci a sentirmi?’ |
- |
- |
- |
Mi vedete? Mi sentite? |
Conclusions
It must be noted that this is a pilot study involving only two workflows; future research could involve larger groups of translators and analyse workflows using different digital tools (e.g., NMT, CAT tools integrated with AI, etc.).
As demonstrated, although the time needed to generate the first translation from the original text in the LLM workflow is significantly reduced (3 vs 86 min.), the overall time for translators to deliver the final version has not decreased substantially (3.75 vs 4 h). This is because AI does not always understand the context and linguistic subtleties, sometimes producing sentences that are hard to understand or even inaccurate – which are easier to identify, but also sentences that, despite appearing grammatically and syntactically correct, do not truly reflect the meaning of the original text (“transparent errors”). The translator then must review the raw output very carefully, correcting errors and adapting the language to the specific context, as only an expert linguist can make the necessary adjustments. Clearly, this step takes time.
On the other hand, the use of CAT tools allows translators to immediately produce a more accurate version of the text in the target language, requiring less revision. However, this forces them to invest much more time in this step, especially if, as in this case, they do not have a Translation Memory available. In fact, students reported that “working from a shared Word file would have been much faster”.
Taking everything into account, it is impossible to say with certainty whether one workflow is more convenient for translators than the other. What is certain, however, is that these two types of work are different and require different skills, as the post-editors of AI output must have a very deep understanding of how machines work and what errors they most frequently make.
In this context, it would be beneficial for translators to receive specialised training on the most common machine errors in translation. This would enable them to identify these errors more swiftly and thus reduce the time spent on post-editing the translated text. Researchers from the Centre conducted a study on the advantages for students of engaging with AI-based NMT and LLM systems, as well as receiving targeted training on how these tools operate and the types of errors they tend to generate. The findings of the research were presented at the PRIN International Conference, held in Bergamo on July 16-17, 20257.
Ultimately, it is essential for translators, particularly when working with AI-translated texts, to retain their creativity, as some linguistic problems cannot be solved by machines alone. Only a human translator’s creativity, intuition, and skill can overcome these challenges. For example, as previously mentioned, when working with language pairs in which one language faces issues with gender neutrality, the translator’s resourcefulness can produce an accurate version that faithfully reflects the source text’s meaning while respecting the subtleties of the target language.
Another aspect we observed is that using a real-time communication tool (Microsoft Teams or WhatsApp, in this case) accelerates certain steps of the workflow and communication (Laurenti & Villareale, 2023), as reported by students involved. In fact, the possibility of asynchronous communication via the comment feature of CAT tools undoubtedly gives translators more flexibility in managing their work and time, but it also extends the delivery time for the final text.
We can then reflect on the practice of collaborative translation. In their reports, students recognised the practical usefulness of collaborative translation, which helps translators improve both their translation skills and soft skills. Collaborative work promotes discussion and the exchange of ideas, enables translators to understand better the source text, and enhances their ability to identify and correct errors in a translated text. It also boosts their confidence in their own abilities and skills.
In light of what has been said so far, it can be concluded that collaborative translation, carried out with the aid of digital tools, not only improves the final quality of the translated text but also enhances translators’ working conditions (Laurenti and Villareale, 2023). It allows them to save time on mechanical tasks and invest it in creative tasks that require human input. Therefore, technology proves to be a valuable aid to translators, provided they know how to utilise it effectively.
