Data Engineer

Company: Institute of Foundation Models
Location: Sunnyvale
Posted on: February 15, 2026

Job Description:

Job Description Job Description About the Institute of Foundation Models We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy. As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers. The Role As a Data Engineer specializing in Natural Language Processing (NLP) and large-scale data processing, you will quickly and effectively gather, curate, and prepare high-quality datasets to support cutting-edge NLP research. Your role will be instrumental in enabling researchers by delivering essential data through efficient and scalable engineering practices, including web crawling, LLM-generated content refinement, and robust data pipelines, primarily leveraging Python and related technologies. Key Responsibilities Rapidly collect, curate, and preprocess datasets based on detailed specifications provided by NLPresearchers,delivering data within tight timelines. Develop and maintain efficient web crawling solutions, APIs, and automated workflows to continuously improve data collection processes. Refine and evaluate outputs from Large Language Models (LLMs) to generate structured datasets suitable for model training and benchmarking. Implement scalable data pipelines, ensuring efficient data processing, storage, retrieval, and distribution to research teams. Collaborate closely with researchers and engineers to ensure collected data meets specified quality and relevance criteria. Document data collection methodologies, dataset characteristics, and pipeline architecture clearly and effectively. Engage with peer teams and participate in technical reviews to uphold best practices and data quality standards. Represent MBZUAI at industry and research forums, showcasing technical capabilities in large-scale data processing and AI data infrastructure. Academic Qualifications Bachelor's degree in Computer Science, Data Science, Engineering, or a related technical field required Master’s degree or PhD degree or equivalent experience in Computer Science, Data Engineering, or related technical fields preferred. Professional Experience - Required Extensive experience in data engineering, data processing, and automation using Python. Demonstrated proficiency in designing and deploying web crawling solutions, automated data extraction, and processing pipelines. Strong understanding of data structures, algorithms, databases, SQL, and performance optimization. Experience working with cloud infrastructure and distributed data processing frameworks (e.g., AWS, Spark, Kafka, Kubernetes). Excellent problem-solving abilities, attention to detail, and the capability to rapidly address technical challenges. Strong communication and collaboration skills with cross-functional teams. Professional Experience - Preferred Proven track record of supporting NLP or AI research teams with rapid and reliable data delivery. Experience working with large language models, including evaluation, efficient inference, and prompt engineering. Experience with refining outputs from large-scale AI models, such as LLM-generated data. Contributions to open-source projects, coding competitions, or high visibility in coding communities (e.g., GitHub, Stack Overflow). Familiarity with the latest advancements in NLP data processing and large language model technologies. Visa Sponsorship This position is eligible for visa sponsorship. Benefits Include *Comprehensive medical, dental, and vision benefits *Bonus *401K Plan *Generous paid time off, sick leave and holidays *Paid Parental Leave *Employee Assistance Program *Life insurance and disability

Keywords: Institute of Foundation Models, San Bruno , Data Engineer, Science, Research & Development , Sunnyvale, California

Didn't find what you're looking for? Search again!

Let Sunnyvale recruiters find you. Post your resume for free!

Get Sunnyvale Science, Research & Development jobs via email.

View more San Bruno Science, Research & Development jobs

Other Science, Research & Development Jobs

Life Science Research Professional 2 (1-Year Fixed-Term)
Description: Life Science Research Professional 2 1-Year Fixed-Term at Stanford University summary: The Life Science Research Professional 2 at Stanford University's Ross Laboratory conducts complex experiments (more...)
Company: Stanford University
Location: Stanford
Posted on: 02/10/2026

(CW) Research Associate
Description: Who We Are BioMarin is a global biotechnology company that relentlessly pursues bold science to translate genetic discoveries into new medicines that advance the future of human health. Since our founding (more...)
Company: BioMarin Pharmaceutical Inc.
Location: San Rafael
Posted on: 02/10/2026

Radar Algorithms Engineer
Description: Job Description Job Description At Array Labs, we are building the world s most advanced radar imaging satellites to produce an accurate, continuously updated 3D map of the Earth, providing governments (more...)
Company: Array Labs
Location: Palo Alto
Posted on: 02/10/2026

Salary in San Bruno, California Area | More details for San Bruno, California Jobs |Salary

Travel Clinical Lab Scientist (CLS) - $2,424 to $2,786 per week in Salinas, CA
Description: Clinical Lab Scientist Location: Salinas, CA Agency: Fusion Medical Staffing Pay: 2,424 to 2,786 per week Shift Information: 5 days x 8 hours Contract Duration: 13 Weeks Start Date: 2/23/2026 About (more...)
Company: Fusion Medical Staffing
Location: Salinas
Posted on: 02/11/2026

Travel Nurse RN - Cath Lab - $2,817 per week in Modesto, CA
Description: TravelNurseSource is working with Triage Staffing to find a qualified Cath Lab RN in Modesto, California, 95350 Pay Information 2,817 per week About The Position Travel Nursing: Cath Lab Modesto Location: (more...)
Company: TravelNurseSource
Location: Modesto
Posted on: 02/9/2026

Travel Clinical Lab Scientist (CLS) - $2,405 per week in Fort Bragg, CA
Description: Clinical Lab Scientist Location: Fort Bragg, CA Agency:
Company: Care Career
Location: Fort Bragg
Posted on: 02/11/2026

Director, US Value, Access and Policy Analytics
Description: Who We Are BioMarin is a global biotechnology company that relentlessly pursues bold science to translate genetic discoveries into new medicines that advance the future of human health. Since our founding (more...)
Company: BioMarin Pharmaceutical Inc.
Location: San Rafael
Posted on: 02/11/2026

Travel Medical Technologist - $2,952 per week in Salinas, CA
Description: Medical Technologist Location: Salinas, CA Agency: Triage Staffing LLC
Company: Triage Staffing LLC
Location: Salinas
Posted on: 02/11/2026

Life Science Research Professional 1 (1 Year Fixed Term)
Description: Life Science Research Professional 1 1 Year Fixed Term at Stanford University summary: The Life Science Research Professional I at Stanford University s Department of Genetics conducts high-throughput (more...)
Company: Stanford University
Location: Stanford
Posted on: 02/10/2026

Clinical Lab Scientist (CLS)
Description: Job Description Tenet North Cal is seeking a Clinical Lab Scientist CLS for a job in Modesto, California. Job Description amp Requirements - Specialty: Clinical Lab Scientist CLS - Discipline: Allied (more...)
Company: Tenet North Cal
Location: Modesto
Posted on: 02/11/2026

Loading more jobs...

Data Engineer

Didn't find what you're looking for? Search again!

Other Science, Research & Development Jobs

Log In or Create An Account