Skip to content

Blog Post Submission: Tracking and Debugging AI Safety Evaluations with Inspect AI and MLflow #532

@debu-sinha

Description

@debu-sinha

Acknowledgements

  • ack/guide I have read through the contributing guide

  • ack/readme I have configured my local development environment so that I can build a local instance of the MLflow website by following the development guide

  • ack/legal I have verified that there are no legal considerations associated with the nature of the blog post, its content, or references to organizations, ideas, or individuals contained within my post. JJ Allaire (Inspect AI lead) has publicly endorsed the package on the Inspect Community Slack and added it to the Inspect AI Extensions page and Scout documentation.

Proposed Title

Tracking and Debugging AI Safety Evaluations with Inspect AI and MLflow

Abstract

This post covers how to use MLflow's tracking and tracing capabilities with Inspect AI, the UK AI Security Institute's open-source evaluation framework (16M+ monthly PyPI downloads). The inspect-mlflow package provides auto-registering hooks that give users hierarchical experiment tracking, execution tracing with span-level visibility into model calls and tool usage, and a Scout import source for safety analysis. The package was built across 4 merged PRs to Inspect AI and published to PyPI after JJ Allaire (Inspect AI lead, creator of RStudio) requested standalone distribution.

Blog Type

  • blog/deep-dive: An in-depth guide that covers a specific feature in MLflow

Topics Covered in Blog

  • topic/genai: Highlights MLflow's use in training, tuning, or deploying GenAI applications
  • topic/tracking: Covering the use of Model Tracking APIs and integrated Model Flavors
  • topic/advanced: Featuring guides on Custom Model Development or usage of the plugin architecture of MLflow
  • topic/ui: Covering features of the MLflow UI

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    ack/guideI have read through and am familiar with the contributing guideack/legalI have read and understand the legal considerations for blog postingack/readmeI have configured my local development environment for building the website locallyblog/deep-diveI want to write an in-depth guide blogtopic/advancedI'm writing about advanced features or the plugin system of MLflowtopic/genaiI'm writing about GenAI use cases or featurestopic/trackingI'm writing about MLflow trackingtopic/uiI'm writing about the MLflow UI

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions