Overlay Main Banner

AI Quality and Evaluation Manager

Job Type
Contract/Temporary
Location
London
Salary
Negotiable
Job Ref
BBBH168886_1761145043
Date Added
October 22nd, 2025
Consultant
Alexandra Bainbridge

AI Quality & Evaluation Manager (Contract)

Location: Hybrid working - Blackfriars 3 days per week
Contract: 6 months, Outside IR35

Are you passionate about building the future of AI quality? Do you thrive in hands-on roles where you can shape frameworks from the ground up and make a real impact? We're looking for an experienced AI Quality & Evaluation Manager to join our team on a contract basis and lay the foundations for robust, reliable, and user-focused AI services across our business.

What You'll Do

  • Design and implement a comprehensive AI testing and evaluation framework for all AI solutions, including LLM-based tools, RAG systems, and third-party platforms.
  • Define and document quality standards for semantic accuracy, factual consistency, bias, tone, and relevance.
  • Develop reusable testing templates, data sets, and evaluation methods that can be scaled and maintained by internal teams.
  • Run hands-on testing of AI prototypes and production tools to assess technical performance and business value.
  • Collaborate with business users to guide practical testing and feedback processes.
  • Deliver training and upskilling materials to empower internal staff to sustain the framework after your contract ends.
  • Support vendor evaluations and POC assessments with robust test protocols.
  • Establish baseline metrics and dashboards to measure ongoing AI quality and relevance.
  • Work closely with engineering and product leads to embed testing into delivery workflows.
  • Champion responsible AI practices to ensure fairness, transparency, and user trust.

What You'll Bring

  • Strong hands-on experience in testing and evaluation of AI or software systems, ideally with NLP or LLM-based applications.
  • Understanding of prompt evaluation, semantic search, and LLM behaviour (accuracy, hallucination, bias, tone, etc.).
  • Familiarity with tools like Trulens, HumanLoop, PromptLayer, or similar; experience designing QA approaches for GenAI environments.
  • Knowledge of modern AI architectures (RAG pipelines, embeddings, API integrations such as OpenAI, Azure OpenAI, Anthropic).
  • Experience designing and implementing structured test regimes in fast-evolving contexts.
  • Excellent communication and facilitation skills, engaging both technical and business audiences.
  • Proven ability to create sustainable frameworks, documentation, and training materials.

Who You Are

  • A builder who loves creating practical, scalable solutions.
  • Hands-on and analytical, balancing experimentation with process.
  • Collaborative and empathetic, bridging technical and non-technical teams.
  • User-focused, driven by delivering real value.
  • Committed to responsible AI, fairness, and transparency.

Ready to shape the future of AI quality with us?
Apply now and help us ensure our AI-enabled services are accurate, consistent, and trusted by all.

Carbon60, Lorien & SRG - The Impellam Group STEM Portfolio are acting as an Employment Business in relation to this vacancy.

Similar Jobs

Apply to this Job


Share this Job