Classes
FeatureHeatmap
Clustered heatmap of feature intensities across samples or groups.
Visualises a sample-by-feature intensity matrix as an annotated heatmap with optional hierarchical clustering of both rows and columns. A group colour-bar can be overlaid on the sample axis when group labels are supplied.
Args:
dfDataFrame with rows=samples and columns=features.labelGroup labels aligned with the rows of *df*. Used to draw acolour-bar annotation on top of the heatmap. Accepts a ``pd.Series`` or any list-like of the same length as ``df``.
featuresExplicit subset of column names to include. When ``None``all columns are used (subject to ``n_features`` in :meth:`plot`).
scalingPre-processing applied to each feature column beforedisplay. One of ``"zscore"`` (zero mean, unit variance), ``"minmax"`` (scale to [0, 1]), or ``"none"`` (raw values).
Raises:
ValueErrorIf *scaling* is not one of the accepted values.ValueErrorIf *label* length does not match the number of rows in*df*.
Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import FeatureHeatmap >>> X = pd.DataFrame( ... np.random.rand(40, 20), ... columns=[f"f{i}" for i in range(20)], ... ) >>> label = pd.Series(["A"] * 20 + ["B"] * 20) >>> hm = FeatureHeatmap(X, label=label, scaling="zscore") >>> fig = hm.plot(n_features=20) >>> isinstance(fig, go.Figure) True
Methods
__init__(self, df: pd.DataFrame, label: Optional[Union[pd.Series, list]]=None, features: Optional[List[str]]=None, scaling: str='zscore')
get_top_features(self, n: int=50)
Return the top *n* features ranked by across-sample variance.
Args:
nNumber of features to return. Capped at the total number ofavailable features.
Returns: DataFrame with columns ``feature`` and ``variance``, sorted descending by variance.
Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import FeatureHeatmap >>> X = pd.DataFrame(np.random.rand(20, 10), ... columns=[f"f{i}" for i in range(10)]) >>> hm = FeatureHeatmap(X) >>> top = hm.get_top_features(n=5) >>> list(top.columns) ['feature', 'variance'] >>> len(top) 5
plot(self, n_features: int=50, cluster_samples: bool=True, cluster_features: bool=True, colorscale: str='RdBu_r', fig_height: int=800, fig_width: int=1000, font_size: int=11, title: Optional[str]=None)
Render the clustered heatmap.
Args:
n_featuresMaximum number of features to display. The topfeatures by variance are selected automatically.
cluster_samplesWhether to reorder samples (rows) byhierarchical clustering.
cluster_featuresWhether to reorder features (columns) byhierarchical clustering.
colorscaleAny Plotly-compatible diverging colorscale name,e.g. ``"RdBu_r"``, ``"Viridis"``, or ``"RdYlGn"``.
fig_heightFigure height in pixels.fig_widthFigure width in pixels.font_sizeBase font size for axis labels and ticks.titleOptional figure title. Defaults to``"Feature Heatmap"``.
Returns:
Aclass:`plotly.graph_objects.Figure` ready for display orexport.
Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import FeatureHeatmap >>> X = pd.DataFrame(np.random.rand(40, 20), ... columns=[f"f{i}" for i in range(20)]) >>> label = pd.Series(["A"] * 20 + ["B"] * 20) >>> hm = FeatureHeatmap(X, label=label) >>> fig = hm.plot(n_features=15) >>> isinstance(fig, go.Figure) True
CorrelationMatrix
Pairwise feature or sample correlation heatmap.
Computes a Pearson or Spearman correlation matrix and renders it as an interactive Plotly heatmap, with optional hierarchical clustering to group correlated entities together.
Args:
dfDataFrame with rows=samples and columns=features.methodCorrelation method. One of ``"pearson"`` or``"spearman"``.
Raises:
ValueErrorIf *method* is not ``"pearson"`` or ``"spearman"``.Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import CorrelationMatrix >>> X = pd.DataFrame(np.random.rand(30, 15), ... columns=[f"f{i}" for i in range(15)]) >>> cm = CorrelationMatrix(X) >>> fig = cm.plot_features(n_features=10) >>> isinstance(fig, go.Figure) True
Methods
__init__(self, df: pd.DataFrame, method: str='pearson')
get_correlation_matrix(self, n_features: int=30)
Return the raw feature-feature correlation matrix.
Args:
n_featuresNumber of features (selected by highest variance) toinclude in the correlation matrix.
Returns: Square DataFrame of shape ``(n_features, n_features)`` containing pairwise correlation coefficients.
Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import CorrelationMatrix >>> X = pd.DataFrame(np.random.rand(20, 10), ... columns=[f"f{i}" for i in range(10)]) >>> cm = CorrelationMatrix(X) >>> corr = cm.get_correlation_matrix(n_features=5) >>> corr.shape (5, 5)
plot_features(self, n_features: int=30, cluster: bool=True, colorscale: str='RdBu_r', fig_height: int=700, fig_width: int=750, font_size: int=10, title: Optional[str]=None)
Plot a feature-by-feature correlation heatmap.
The top *n_features* features by variance are selected and their pairwise correlations displayed. The diagonal is masked to grey to avoid visual distraction from the trivial self-correlation of 1.
Args:
n_featuresNumber of features to include.clusterReorder features by hierarchical clustering when``True``.
colorscaleDiverging Plotly colorscale for correlation values.fig_heightFigure height in pixels.fig_widthFigure width in pixels.font_sizeBase font size.titleOptional figure title.Returns: :class:`plotly.graph_objects.Figure`
Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import CorrelationMatrix >>> X = pd.DataFrame(np.random.rand(30, 15), ... columns=[f"f{i}" for i in range(15)]) >>> cm = CorrelationMatrix(X) >>> fig = cm.plot_features(n_features=10, cluster=False) >>> isinstance(fig, go.Figure) True
plot_samples(self, cluster: bool=True, label: Optional[Union[pd.Series, list]]=None, colorscale: str='RdBu_r', fig_height: int=700, fig_width: int=750, font_size: int=10, title: Optional[str]=None)
Plot a sample-by-sample correlation heatmap.
Args:
clusterReorder samples by hierarchical clustering when``True``.
labelOptional group labels for samples. When provided, acolour-coded stripe is rendered above the heatmap.
colorscaleDiverging Plotly colorscale.fig_heightFigure height in pixels.fig_widthFigure width in pixels.font_sizeBase font size.titleOptional figure title.Returns: :class:`plotly.graph_objects.Figure`
Raises:
ValueErrorIf *label* length does not match the number ofsamples in *df*.
Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import CorrelationMatrix >>> X = pd.DataFrame(np.random.rand(20, 10), ... columns=[f"f{i}" for i in range(10)]) >>> lbl = pd.Series(["A"] * 10 + ["B"] * 10) >>> cm = CorrelationMatrix(X) >>> fig = cm.plot_samples(label=lbl) >>> isinstance(fig, go.Figure) True
PValueTable
Visual table of pairwise statistical test results with significance stars.
Accepts a tidy DataFrame and performs univariate statistical tests for one or more numeric columns, comparing values across groups defined by *group_col*. The results are presented as a colour-coded Plotly table.
Args:
dfTidy DataFrame containing at least *group_col* and one or morenumeric value columns.
group_colName of the column that encodes group membership.value_colSingle numeric column to test. When ``None`` everynumeric column (excluding *group_col*) is tested.
testStatistical test to apply. ``"auto"`` selects a t-test fortwo-group comparisons and one-way ANOVA for three or more.
Explicit choices``"ttest"``, ``"mannwhitney"``,``"anova"``, ``"kruskal"``.
correct_pMultiple-testing correction method accepted by:func:`statsmodels.stats.multitest.multipletests`, e.g. ``"fdr_bh"``, ``"bonferroni"``, or ``None`` to skip correction.
p_thresholdSignificance threshold for colouring. Default``0.05``.
Raises:
ValueErrorIf *group_col* is not present in *df*.ValueErrorIf *test* is not one of the accepted values.ValueErrorIf *value_col* is specified but not found in *df*.Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import PValueTable >>> rng = np.random.default_rng(0) >>> df = pd.DataFrame({ ... "group": ["A"] * 20 + ["B"] * 20, ... "glucose": rng.normal(5, 1, 40), ... "lactate": rng.normal(2, 0.5, 40), ... }) >>> pv = PValueTable(df, group_col="group") >>> tbl = pv.get_table() >>> list(tbl.columns) ['feature', 'statistic', 'p_value', 'p_adj', 'stars']
Methods
__init__(self, df: pd.DataFrame, group_col: str, value_col: Optional[str]=None, test: str='auto', correct_p: Optional[str]='fdr_bh', p_threshold: float=0.05)
get_table(self)
Return the statistical results as a tidy DataFrame.
Computes the test results lazily on first call and caches them for subsequent calls.
Returns: DataFrame with columns:
- ``feature``: name of the tested column - ``statistic``: test statistic (F, t, or U depending on test) - ``p_value``: raw p-value - ``p_adj``: adjusted p-value (or raw if *correct_p* is ``None``) - ``stars``: significance annotation (``***``, ``**``, ``*``, or ``ns``)
Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import PValueTable >>> rng = np.random.default_rng(42) >>> df = pd.DataFrame({ ... "group": ["A"] * 20 + ["B"] * 20, ... "x": rng.normal(0, 1, 40), ... }) >>> pv = PValueTable(df, group_col="group") >>> tbl = pv.get_table() >>> "p_value" in tbl.columns True
plot(self, fig_height: int=600, fig_width: int=900, font_size: int=12, title: Optional[str]=None)
Render a colour-coded significance table.
Each row corresponds to a tested feature. Cells in the adjusted p-value column are coloured green when the result is significant (``p_adj < p_threshold``) and grey otherwise. The full p-value and star annotation are shown in each row.
Args:
fig_heightFigure height in pixels.fig_widthFigure width in pixels.font_sizeFont size for table cells and header.titleOptional figure title. Defaults to``"Statistical Test Results"``.
Returns: :class:`plotly.graph_objects.Figure`
Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import PValueTable >>> rng = np.random.default_rng(0) >>> df = pd.DataFrame({ ... "group": ["A"] * 15 + ["B"] * 15, ... "alanine": rng.normal(3, 1, 30), ... "valine": rng.normal(2, 1, 30), ... }) >>> pv = PValueTable(df, group_col="group") >>> fig = pv.plot() >>> isinstance(fig, go.Figure) True