Advanced guides

Data Catalog

Arguably, one of the most essential components of this solution is a data catalog. Given the vast amount of information compiled about our system, it is reasonable to provide a user-friendly interface for viewing this information.

The data catalog allows users to browse through all features and models, and to view the current state of the system. Consequently, it provides a comprehensive understanding of all our existing data, where it is stored, and how it can be used.

The image bellow illustrates several features for the taxi_vendor feature view, which is associated with the taxi model described in section (\ref{sec:use-cases}). This provides clarity on where the historical data will be stored, where real-time values will be processed from, and how the different features depend on each other.

\begin{figure}[ht] \begin{center} \includegraphics[width=0.8\columnwidth]{figs/data-catalog-feature-documentation.png} \caption[Feature documentation]{Feature documentation} \label{fig:data-cat-feature} \end{center} \end{figure}

A similar page exists for the taxi model itself, as shown in figure (\ref{fig:data-cat-model}). This page displays the features used, the required entities, and the data lineage for all features. It also enables the view of features from a low latency online source, if available, and real-time performance metrics if ground truth values are monitored, as discussed in section (\ref{sec:model-monitoring}).

\begin{figure}[ht] \begin{center} \includegraphics[width=0.8\columnwidth]{figs/data-catalog-model.png} \caption[ML model documentation]{Documentation of our ML models} \label{fig:data-cat-model} \end{center} \end{figure}

Previous
Aligned Documentation