Imagine learning to translate between English and Japanese without ever seeing a single English-Japanese language guide. That's essentially what large language models (LLMs) like GPT-4 and Claude are doing, and scientists have been puzzled by this ability.
For decades, translation software needed extensive parallel examples, documents available in both languages, to learn how to translate. But modern AI systems can translate between languages they've never been explicitly taught to connect. How is this possible?
The Translation Mystery
Traditional translation systems are like students who memorize vocabulary lists and grammar rules. They need direct examples that show "this phrase in English equals that phrase in Japanese". These systems struggle with less common language pairs where such examples are scarce.
By contrast, today's AI models can often produce reasonable translations even between language pairs they weren't specifically trained to translate [1][2]. This ability emerges naturally as the models grow larger and train on more diverse text.

How It Works: The Shared Meaning Space
Scientists believe LLMs develop what they call a "shared semantic space", essentially a map of meanings that works across different languages.
Meaning Over Words
When an AI learns language, it's not just memorizing words, it's creating a map of concepts and ideas [3]. Words or phrases with similar meanings get placed close together in this map, regardless of what language they're in.
Common Contexts
When the AI encounters similar content in different languages, like news about the same event, it learns that these contents are talking about the same things, even without being told they're translations of each other.
Researchers have tested this theory by showing that words with similar meanings cluster together in the AI's internal representations, and translation quality improves as models get better at matching concepts across languages [2].
Real-World Results
Recent studies show these abilities are impressive. For example, GPT-4 can translate between language pairs it wasn't specifically trained to connect [1].
"LLMs can acquire translation ability in a resource-efficient way and generate moderate translation even on zero-resource languages."
— Zhu et al., Multilingual Machine Translation with Large Language Models
Researchers have found that AI systems can become decent translators with surprisingly little training, sometimes as few as 32 example sentence pairs [6]. This suggests the foundation for translation is already built during the AI's general language learning.
What This Means for the Future
These findings have significant implications for translation technology. As AI systems improve their ability to translate without explicit training between language pairs, we could see dramatic changes in how translation tools work and who can access them.
More Languages Covered
Translation tools are already scaling to dozens of languages: for example, the open-access BLOOM model, a 176 billion-parameter multilingual Transformer, was trained on the ROOTS corpus covering 46 natural languages (plus 13 programming languages) and achieves competitive translation performance across a wide range of benchmarks [4]. This could help preserve endangered languages and give more people access to global information. Recent research demonstrates how LLMs can be adapted to support 100 languages through specialized training techniques [5].

Beyond Words: Understanding Multiple Symbol Systems
The way AI learns to translate between languages may also explain how it understands other symbolic systems, like diagrams, handwriting, or mathematical notation.

Universal Symbol Processing
Whether it's languages, handwriting, or diagrams, AI systems may be developing a general ability to map between different ways of representing the same ideas [3]. In practice, this means a model trained on text can often recognize and interpret simple sketches or handwritten notes by projecting them into the same conceptual space. Over time, these shared embeddings allow the system to perform tasks like translating a flowchart into descriptive prose or converting mathematical notation into natural-language explanations, all without task-specific retraining.
Conclusion
The discovery that AI can translate between languages without explicit training is more than just a technological achievement, it gives us new insights into how knowledge is represented in AI systems. As these models continue to develop, we may see even more surprising abilities emerge.
These capabilities bring us closer to more accessible and universal translation tools that could help break down language barriers worldwide. While debates continue about exactly how these abilities develop, their practical impact is already beginning to change language technologies.
References
1 Zhu, W., Liu, H., Dong, Q., et al., "Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis," Findings of the Association for Computational Linguistics: NAACL, 2024, Online.
2 Li, J., Zhou, H., Huang, S., et al., "Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions," Transactions of the Association for Computational Linguistics, vol. 12, pp. 576-592, 2024, Online.
3 Conneau, A., Lample, G., "Cross-lingual Language Model Pretraining," Advances in Neural Information Processing Systems, 2019, Online.
4 BigScience Workshop, Le Scao, T., Fan, A., Akiki, C., Pavlick, E., Ilić, S., … et al., “BLOOM: A 176B-Parameter Open-Access Multilingual Language Model,” arXiv preprint arXiv:2211.05100, Nov 2022, Online.
5 Lai, W., et al., "LLMs Beyond English: Scaling the Multilingual Capability of LLMs with Cross-Lingual Feedback," arXiv, 2024, Online.
6 Lin, X., Wang, Y., Zhang, X., et al., "Few-shot Learning with Multilingual Language Models," EMNLP, 2022, Online.
The AI Revolution in Disease Detection
How artificial intelligence is transforming medical diagnostics in 2025
Metacognitive Reinforcement Learning for Self-Improving AI Systems
An exploration of how AI systems can monitor and enhance their own cognitive processes through metacognitive reinforcement learning mechanisms