Machine Learning Research Engineer in Natural Language Processing and Media Mining
École polytechnique fédérale de Lausanne, EPFL
Prendre contact
Liens Importants
Offre >
Entreprise >
EPFL, l’École polytechnique fédérale de Lausanne, est l’un des campus universitaires les plus dynamiques d’Europe et
s’inscrit dans le top 20 mondial de nombreux classements académiques. L’EPFL compte plus de 6'500 membres du personnel
qui font vivre les trois missions principales de l’École : l’éducation, la recherche et l’innovation. Le campus de
l’EPFL offre un cadre de travail exceptionnel au cœur d’une communauté de plus de 18'500 personnes, dont près de 14'000
étudiant·e·s et 4'000 chercheur·euse·s issu·e·s de plus de 120 nationalités différentes.
Machine Learning Research Engineer in Natural Language Processing and Media Mining
Join the Impresso Project
The Impresso – Media Monitoring of the Past II project at the EPFL Digital Humanities Laboratory (DHLAB) is seeking a
machine learning research engineer for the final phase of the project. The successful candidate will integrate into an
active, collaborative development effort and contribute to the application and consolidation of large-scale text mining
pipelines for multilingual historical newspaper and radio archives , bridging research, engineering, and digital
humanities
About the project
Impresso is an interdisciplinary research project that brings together computational linguists, computer scientists,
digital humanists, historians, and designers from EPFL, the University of Zurich, the University of Lausanne, and the
C2DH (Luxembourg), along with over 20 European partners. Funded by the Swiss National Science Foundation and the
Luxembourg National Research Fund (2023–2027), the project pioneers new methods for exploring digitized newspaper and
radio archives across languages, media, and borders through semantic enrichments and shared multilingual vector spaces
Mission
You will conduct applied research and engineering in natural language processing and text mining on large-scale,
noisy, and multilingual historical texts. Working within an established and actively maintained codebase, you will help
advance and consolidate the project’s processing pipelines, bridging research and engineering in close collaboration
with an interdisciplinary team.
Main duties and responsibilities
-
Apply and adapt existing NLP and computer vision models to large-scale, multilingual historical text and image data.
-
Fine-tune or design models for additional text mining tasks, in particular media section classification.
-
Support the creation of ground truth data by adapting the setup of web-based annotation tools, and assist in the
management of annotation campaigns and data releases.
-
Contribute to the maintenance and adaptation of web-serving setups for annotation models (TorchServe).
-
Support the consolidation, validation, and documentation of existing data, pipeline components, and code modules.
Additional activities (optional / depending on profile)
-
Collaborate on the design of Impresso WebApp, Datalab and API
-
Participate in the development and adoption of standards for the representation and exchange of historical data (raw
material and annotations)
-
Contribute to scientific publications and project workshops on media mining, semantic indexing, and sustainability
Your Profile
-
Experience: 1-3 years as a machine learning engineer or NLP researcher/programmer
-
Education: MSc or PhD in NLP, Computer Science, Data Science, or a related field, or equivalent professional
experience in machine learning/NLP
-
Technical skills:
-
Solid expertise in machine learning, with practical experience in deep learning architectures (transformers, language
models) and information extraction tasks
-
Proficiency in Python, Unix-based systems, databases (SQL/NoSQL), cloud storage and computing (S3, Kubernetes,
Run:AI), and scripting/automation
-
Familiarity with collaborative development and code/model management platforms (GitHub, Hugging Face, and related
tools)
-
Mindset: Curious, creative, rigorous, and attentive to detail; motivated by scientific research and cultural heritage
applications, with a proactive and problem-solving attitude
-
Strong sense of teamwork, communication, accountability, and production readiness
-
Very good command of written and spoken English
Desirable skills
- Prior experience in an academic or research context
- Experience with historical or digitized documents and interdisciplinary collaboration
- Experience with image processing alongside text and language data is a plus
- Interest in student supervision and academic publication
- Knowledge of French or German
Practical details
-
Employment duration : 12 months
-
Employment rate: 100%
-
Foreseen start of contract: 15.01.2026
-
Application deadline : 8 December 2025
-
Interviews : 15-18 December 2025
- Place of work : EPFL DHLAB, Lausanne, Switzerland
-
How to apply : please upload your complete application (full CV, a 1-page cover letter and the contact information of
2 to 3 referees) via the EPFL portal
-
Contact : for any questions please contact Marina Buyrskaya Moyer (marina.butyrskayamoyer[at]epfl.ch) and Maud Ehrmann
(maud.ehrmann[at]epfl.ch)
Seul·es les candidat·e·s ayant postulé via le site internet de l'EPFL ou celui de notre partenaire Jobup seront pris
en compte. Les dossiers transmis par les agences non mandatées ne seront pas pris en compte.
s’inscrit dans le top 20 mondial de nombreux classements académiques. L’EPFL compte plus de 6'500 membres du personnel
qui font vivre les trois missions principales de l’École : l’éducation, la recherche et l’innovation. Le campus de
l’EPFL offre un cadre de travail exceptionnel au cœur d’une communauté de plus de 18'500 personnes, dont près de 14'000
étudiant·e·s et 4'000 chercheur·euse·s issu·e·s de plus de 120 nationalités différentes.
Machine Learning Research Engineer in Natural Language Processing and Media Mining
Join the Impresso Project
The Impresso – Media Monitoring of the Past II project at the EPFL Digital Humanities Laboratory (DHLAB) is seeking a
machine learning research engineer for the final phase of the project. The successful candidate will integrate into an
active, collaborative development effort and contribute to the application and consolidation of large-scale text mining
pipelines for multilingual historical newspaper and radio archives , bridging research, engineering, and digital
humanities
About the project
Impresso is an interdisciplinary research project that brings together computational linguists, computer scientists,
digital humanists, historians, and designers from EPFL, the University of Zurich, the University of Lausanne, and the
C2DH (Luxembourg), along with over 20 European partners. Funded by the Swiss National Science Foundation and the
Luxembourg National Research Fund (2023–2027), the project pioneers new methods for exploring digitized newspaper and
radio archives across languages, media, and borders through semantic enrichments and shared multilingual vector spaces
Mission
You will conduct applied research and engineering in natural language processing and text mining on large-scale,
noisy, and multilingual historical texts. Working within an established and actively maintained codebase, you will help
advance and consolidate the project’s processing pipelines, bridging research and engineering in close collaboration
with an interdisciplinary team.
Main duties and responsibilities
-
Apply and adapt existing NLP and computer vision models to large-scale, multilingual historical text and image data.
-
Fine-tune or design models for additional text mining tasks, in particular media section classification.
-
Support the creation of ground truth data by adapting the setup of web-based annotation tools, and assist in the
management of annotation campaigns and data releases.
-
Contribute to the maintenance and adaptation of web-serving setups for annotation models (TorchServe).
-
Support the consolidation, validation, and documentation of existing data, pipeline components, and code modules.
Additional activities (optional / depending on profile)
-
Collaborate on the design of Impresso WebApp, Datalab and API
-
Participate in the development and adoption of standards for the representation and exchange of historical data (raw
material and annotations)
-
Contribute to scientific publications and project workshops on media mining, semantic indexing, and sustainability
Your Profile
-
Experience: 1-3 years as a machine learning engineer or NLP researcher/programmer
-
Education: MSc or PhD in NLP, Computer Science, Data Science, or a related field, or equivalent professional
experience in machine learning/NLP
-
Technical skills:
-
Solid expertise in machine learning, with practical experience in deep learning architectures (transformers, language
models) and information extraction tasks
-
Proficiency in Python, Unix-based systems, databases (SQL/NoSQL), cloud storage and computing (S3, Kubernetes,
Run:AI), and scripting/automation
-
Familiarity with collaborative development and code/model management platforms (GitHub, Hugging Face, and related
tools)
-
Mindset: Curious, creative, rigorous, and attentive to detail; motivated by scientific research and cultural heritage
applications, with a proactive and problem-solving attitude
-
Strong sense of teamwork, communication, accountability, and production readiness
-
Very good command of written and spoken English
Desirable skills
- Prior experience in an academic or research context
- Experience with historical or digitized documents and interdisciplinary collaboration
- Experience with image processing alongside text and language data is a plus
- Interest in student supervision and academic publication
- Knowledge of French or German
Practical details
-
Employment duration : 12 months
-
Employment rate: 100%
-
Foreseen start of contract: 15.01.2026
-
Application deadline : 8 December 2025
-
Interviews : 15-18 December 2025
- Place of work : EPFL DHLAB, Lausanne, Switzerland
-
How to apply : please upload your complete application (full CV, a 1-page cover letter and the contact information of
2 to 3 referees) via the EPFL portal
-
Contact : for any questions please contact Marina Buyrskaya Moyer (marina.butyrskayamoyer[at]epfl.ch) and Maud Ehrmann
(maud.ehrmann[at]epfl.ch)
Seul·es les candidat·e·s ayant postulé via le site internet de l'EPFL ou celui de notre partenaire Jobup seront pris
en compte. Les dossiers transmis par les agences non mandatées ne seront pas pris en compte.