Ground Truth Monitoring

We tend to measure how well our models perform when training a model. However, we tend to not be as good at measuring the performance in production. This is where Aligned's ground truth monitoring comes into play.

Load Predictions

The first step to measure ground truth performance will be to load predictions for a model. Loading the predictions can be done with the following code.

predictions = await store.model("titanic").predictions_for({
    "passenger_id": [10, 11, ...]
}).to_polars()

This will load data from the source define in the model_contract.

from aligned import FileSource, String, Int32, model_contract, EventTimestamp
from examples.credit_scoring.credit_history import CreditHistory
from examples.credit_scoring.zipcode import Zipcode
from examples.credit_scoring.loan import Loan


credit = CreditHistory()
zipcode = Zipcode()
loan = Loan()

@model_contract(
    name="credit_scoring",
    description="A model that do credit scoring",
    features=[
        credit.credit_card_due,
        credit.mortgage_due,
        ...
    ],
    predictions_source=FileSource.csv_at("taxi/predictions.csv")
)
class CreditScoring:

    was_granted_loan = loan.loan_status.as_classification_label()

    loan_id = String().as_entity()
    model_version = Int32().as_model_version()

    predicted_at = EventTimestamp()

Here will Aligned asume that the CSV file at taxi/predictions.csv will contain all predictions in the format loan_id, was_granted_loan, predicted_at and model_version.

Model Metrics

Furthermore, since the model contract knows if the model is a classification or regression model can Aligend generate relevant metrics for you. Such as confusion matrixes, accuracy, f1 score, etc.

The metrics can then be displayed in the data cataloge if wanted.