dabl.plot.plot_classification_continuous

dabl.plot.plot_classification_continuous(X, target_col, types=None, hue_order=None, scatter_alpha='auto', scatter_size='auto', univariate_plot='histogram', drop_outliers=True, plot_pairwise=True, top_k_interactions=10, random_state=None, **kwargs)[source]

Plots for continuous features in classification.

Selects important continuous features according to F statistics. Creates univariate distribution plots for these, as well as scatterplots for selected pairs of features, and scatterplots for selected pairs of PCA directions. If there are more than 2 classes, scatter plots from Linear Discriminant Analysis are also shown. Scatter plots are determined “interesting” is a decision tree on the two-dimensional projection performs well. The cross-validated macro-average recall of a decision tree is shown in the title for each scatterplot.

Parameters
Xdataframe

Input data including features and target

target_colstr or int

Identifier of the target column in X

typesdataframe of types, optional.

Output of detect_types on X. Can be used to avoid recomputing the types.

scatter_alphafloat, default=’auto’.

Alpha values for scatter plots. ‘auto’ is dirty hacks.

scatter_sizefloat, default=’auto’.

Marker size for scatter plots. ‘auto’ is dirty hacks.

univariate_plotstring, default=”histogram”

Supported: ‘histogram’ and ‘kde’.

drop_outliersbool, default=True

Whether to drop outliers when plotting.

plot_pairwisebool, default=True

Whether to create pairwise plots. Can be a bit slow.

top_k_interactionsint, default=10

How many pairwise interactions to consider (ranked by univariate f scores). Runtime is quadratic in this, but higher numbers might find more interesting interactions.

random_stateint, None or numpy RandomState

Random state used for subsampling for determining pairwise features to show.

Notes

important kwargs parameters are: scatter_size and scatter_alpha.