gobbli.inspect.evaluate module

class gobbli.inspect.evaluate.ClassificationError(X, y_true, y_pred_proba)[source]

Bases: object

Describes an error in classification. Reports the original text, the true label, and the predicted probability.

Parameters
  • X (str) – The original text.

  • y_true (Union[str, List[str]]) – The true label(s).

  • y_pred_proba (Dict[str, float]) – The model predicted probability for each class.

property y_pred
Return type

str

Returns

The class with the highest predicted probability for this observation.

y_pred_multilabel(threshold=0.5)[source]
Parameters

threshold (float) – The predicted probability threshold for predictions

Return type

List[str]

Returns

The predicted labels for this observation (predicted probability greater than the given threshold)

class gobbli.inspect.evaluate.ClassificationEvaluation(labels, X, y_true, y_pred_proba, metric_funcs=None)[source]

Bases: object

Provides several methods for evaluating the results from a classification problem.

Parameters
  • labels (List[str]) – The set of unique labels in the dataset.

  • X (List[str]) – The list of texts that were classified.

  • y_true (Union[List[str], List[List[str]]]) – The true labels for the dataset.

  • y_pred_proba (DataFrame) – A dataframe containing a row for each observation in X and a column for each label in the training data. Cells are predicted probabilities.

errors(k=10)[source]

Output the biggest mistakes for each class by the classifier.

Parameters

k (int) – The number of results to return for each of false positives and false negatives.

Return type

Dict[str, Tuple[List[ClassificationError], List[ClassificationError]]]

Returns

A dictionary whose keys are label names and values are 2-tuples. The first element is a list of the top k false positives, and the second element is a list of the top k false negatives.

errors_for_label(label, k=10)[source]

Output the biggest mistakes for the given class by the classifier

Parameters
  • label (str) – The label to return errors for.

  • k (int) – The number of results to return for each of false positives and false negatives.

Returns

A 2-tuple. The first element is a list of the top k false positives, and the second element is a list of the top k false negatives.

errors_report(k=10)[source]
Parameters

k (int) – The number of results to return for each of false positives and false negatives.

Return type

str

Returns

A nicely-formatted human-readable report describing the biggest mistakes made by the classifier for each class.

metric_funcs = None
metrics()[source]
Return type

Dict[str, float]

Returns

A dictionary containing various metrics of model performance on the test dataset.

metrics_report()[source]
Return type

str

Returns

A nicely formatted human-readable report describing metrics of model performance on the test dataset.

plot(sample_size=None)[source]
Parameters

sample_size (Optional[int]) – Optional number of points to sample for the plot. Unsampled plots may be difficult to save due to their size.

Return type

Chart

Returns

An Altair chart visualizing predicted probabilities and true classes to visually identify where errors are being made.

property y_pred_multiclass
Return type

List[str]

Returns

Predicted class for each observation (assuming multiclass context).

property y_pred_multilabel
Return type

DataFrame

Returns

Indicator dataframe containing a 0 if each label wasn’t predicted and 1 if it was for each observation.

property y_true_multiclass
Return type

List[str]

property y_true_multilabel
Return type

DataFrame

gobbli.inspect.evaluate.DEFAULT_METRICS = {'Accuracy': <function <lambda>>, 'Weighted F1 Score': <function <lambda>>, 'Weighted Precision Score': <function <lambda>>, 'Weighted Recall Score': <function <lambda>>}

The default set of metrics to evaluate classification models with. Users may want to extend this.

gobbli.inspect.evaluate.MetricFunc = typing.Callable[[typing.Sequence[str], pandas.core.frame.DataFrame], float]

A function used to calculate some metric. It should accept a sequence of true labels (y_true) and a dataframe of shape (n_samples, n_classes) containing predicted probabilities; it should output a real number.