EC
Eclipse Medical B.V.
Data Scientist (AI Data & LLM Specialist)
Low visibility — this company rarely posts on big job boards.
Join the core team at Eclipse, where we’re building an AI agent-first marketplace that connects intelligence with real-world tasks, starting with data collection and labeling. We are seeking a Data Scientist to establish the foundation for how our data is labeled, processed, and prepared for consumption by next-generation Large Language Models (LLMs). Your work will be critical in transforming our raw data collections into valuable, AI-ready datasets. Qualifications Proven experience as a Data Scientist or Machine Learning Engineer with a focus on data quality and preparation. Strong understanding of data labeling methodologies and hands-on experience with data annotation platforms and workflows. Demonstrated experience preparing datasets for training and fine-tuning Large Language Models (LLMs), including knowledge of techniques like tokenization, embeddings, and NER. Proficiency in Python and common data science libraries (e.g., Pandas, NumPy, Scikit-learn, spaCy, Hugging Face). Experience using APIs/SDKs to automate data annotation and active learning loops. Excellent communication skills, with an ability to create clear documentation for technical and non-technical audiences. Responsibilities Develop Data Labeling Strategies: Design and document a formal data annotation strategy, including clear, scalable, and efficient guidelines for labeling our data. Define and enforce quality metrics, including inter-annotator agreement. Optimize for LLM Consumption: Resea
AI