API reference / Visualization and apps

You are viewing the documentation for metbit 9.1.0. Change release context

metbit.viz.summary

Visualization and apps module in metbit 9.1.0.

import metbit.viz.summary

Classes

FeatureHeatmap

Clustered heatmap of feature intensities across samples or groups.

Visualises a sample-by-feature intensity matrix as an annotated heatmap with optional hierarchical clustering of both rows and columns. A group colour-bar can be overlaid on the sample axis when group labels are supplied.

Args:

dfDataFrame with rows=samples and columns=features.

labelGroup labels aligned with the rows of *df*. Used to draw a

colour-bar annotation on top of the heatmap. Accepts a ``pd.Series`` or any list-like of the same length as ``df``.

featuresExplicit subset of column names to include. When ``None``

all columns are used (subject to ``n_features`` in :meth:`plot`).

scalingPre-processing applied to each feature column before

display. One of ``"zscore"`` (zero mean, unit variance), ``"minmax"`` (scale to [0, 1]), or ``"none"`` (raw values).

Raises:

ValueErrorIf *scaling* is not one of the accepted values.

ValueErrorIf *label* length does not match the number of rows in

*df*.

Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import FeatureHeatmap >>> X = pd.DataFrame( ... np.random.rand(40, 20), ... columns=[f"f{i}" for i in range(20)], ... ) >>> label = pd.Series(["A"] * 20 + ["B"] * 20) >>> hm = FeatureHeatmap(X, label=label, scaling="zscore") >>> fig = hm.plot(n_features=20) >>> isinstance(fig, go.Figure) True

Methods

__init__(self, df: pd.DataFrame, label: Optional[Union[pd.Series, list]]=None, features: Optional[List[str]]=None, scaling: str='zscore')

get_top_features(self, n: int=50)

Return the top *n* features ranked by across-sample variance.

Args:

nNumber of features to return. Capped at the total number of

available features.

Returns: DataFrame with columns ``feature`` and ``variance``, sorted descending by variance.

Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import FeatureHeatmap >>> X = pd.DataFrame(np.random.rand(20, 10), ... columns=[f"f{i}" for i in range(10)]) >>> hm = FeatureHeatmap(X) >>> top = hm.get_top_features(n=5) >>> list(top.columns) ['feature', 'variance'] >>> len(top) 5

plot(self, n_features: int=50, cluster_samples: bool=True, cluster_features: bool=True, colorscale: str='RdBu_r', fig_height: int=800, fig_width: int=1000, font_size: int=11, title: Optional[str]=None)

Render the clustered heatmap.

Args:

n_featuresMaximum number of features to display. The top

features by variance are selected automatically.

cluster_samplesWhether to reorder samples (rows) by

hierarchical clustering.

cluster_featuresWhether to reorder features (columns) by

hierarchical clustering.

colorscaleAny Plotly-compatible diverging colorscale name,

e.g. ``"RdBu_r"``, ``"Viridis"``, or ``"RdYlGn"``.

fig_heightFigure height in pixels.

fig_widthFigure width in pixels.

font_sizeBase font size for axis labels and ticks.

titleOptional figure title. Defaults to

``"Feature Heatmap"``.

Returns:

Aclass:`plotly.graph_objects.Figure` ready for display or

export.

Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import FeatureHeatmap >>> X = pd.DataFrame(np.random.rand(40, 20), ... columns=[f"f{i}" for i in range(20)]) >>> label = pd.Series(["A"] * 20 + ["B"] * 20) >>> hm = FeatureHeatmap(X, label=label) >>> fig = hm.plot(n_features=15) >>> isinstance(fig, go.Figure) True

CorrelationMatrix

Pairwise feature or sample correlation heatmap.

Computes a Pearson or Spearman correlation matrix and renders it as an interactive Plotly heatmap, with optional hierarchical clustering to group correlated entities together.

Args:

dfDataFrame with rows=samples and columns=features.

methodCorrelation method. One of ``"pearson"`` or

``"spearman"``.

Raises:

ValueErrorIf *method* is not ``"pearson"`` or ``"spearman"``.

Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import CorrelationMatrix >>> X = pd.DataFrame(np.random.rand(30, 15), ... columns=[f"f{i}" for i in range(15)]) >>> cm = CorrelationMatrix(X) >>> fig = cm.plot_features(n_features=10) >>> isinstance(fig, go.Figure) True

Methods

__init__(self, df: pd.DataFrame, method: str='pearson')

get_correlation_matrix(self, n_features: int=30)

Return the raw feature-feature correlation matrix.

Args:

n_featuresNumber of features (selected by highest variance) to

include in the correlation matrix.

Returns: Square DataFrame of shape ``(n_features, n_features)`` containing pairwise correlation coefficients.

Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import CorrelationMatrix >>> X = pd.DataFrame(np.random.rand(20, 10), ... columns=[f"f{i}" for i in range(10)]) >>> cm = CorrelationMatrix(X) >>> corr = cm.get_correlation_matrix(n_features=5) >>> corr.shape (5, 5)

plot_features(self, n_features: int=30, cluster: bool=True, colorscale: str='RdBu_r', fig_height: int=700, fig_width: int=750, font_size: int=10, title: Optional[str]=None)

Plot a feature-by-feature correlation heatmap.

The top *n_features* features by variance are selected and their pairwise correlations displayed. The diagonal is masked to grey to avoid visual distraction from the trivial self-correlation of 1.

Args:

n_featuresNumber of features to include.

clusterReorder features by hierarchical clustering when

``True``.

colorscaleDiverging Plotly colorscale for correlation values.

fig_heightFigure height in pixels.

fig_widthFigure width in pixels.

font_sizeBase font size.

titleOptional figure title.

Returns: :class:`plotly.graph_objects.Figure`

plot_samples(self, cluster: bool=True, label: Optional[Union[pd.Series, list]]=None, colorscale: str='RdBu_r', fig_height: int=700, fig_width: int=750, font_size: int=10, title: Optional[str]=None)

Plot a sample-by-sample correlation heatmap.

Args:

clusterReorder samples by hierarchical clustering when

``True``.

labelOptional group labels for samples. When provided, a

colour-coded stripe is rendered above the heatmap.

colorscaleDiverging Plotly colorscale.

fig_heightFigure height in pixels.

fig_widthFigure width in pixels.

font_sizeBase font size.

titleOptional figure title.

Returns: :class:`plotly.graph_objects.Figure`

Raises:

ValueErrorIf *label* length does not match the number of

samples in *df*.

Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import CorrelationMatrix >>> X = pd.DataFrame(np.random.rand(20, 10), ... columns=[f"f{i}" for i in range(10)]) >>> lbl = pd.Series(["A"] * 10 + ["B"] * 10) >>> cm = CorrelationMatrix(X) >>> fig = cm.plot_samples(label=lbl) >>> isinstance(fig, go.Figure) True

PValueTable

Visual table of pairwise statistical test results with significance stars.

Accepts a tidy DataFrame and performs univariate statistical tests for one or more numeric columns, comparing values across groups defined by *group_col*. The results are presented as a colour-coded Plotly table.

Args:

dfTidy DataFrame containing at least *group_col* and one or more

numeric value columns.

group_colName of the column that encodes group membership.

value_colSingle numeric column to test. When ``None`` every

numeric column (excluding *group_col*) is tested.

testStatistical test to apply. ``"auto"`` selects a t-test for

two-group comparisons and one-way ANOVA for three or more.

Explicit choices``"ttest"``, ``"mannwhitney"``,

``"anova"``, ``"kruskal"``.

correct_pMultiple-testing correction method accepted by

:func:`statsmodels.stats.multitest.multipletests`, e.g. ``"fdr_bh"``, ``"bonferroni"``, or ``None`` to skip correction.

p_thresholdSignificance threshold for colouring. Default

``0.05``.

Raises:

ValueErrorIf *group_col* is not present in *df*.

ValueErrorIf *test* is not one of the accepted values.

ValueErrorIf *value_col* is specified but not found in *df*.

Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import PValueTable >>> rng = np.random.default_rng(0) >>> df = pd.DataFrame({ ... "group": ["A"] * 20 + ["B"] * 20, ... "glucose": rng.normal(5, 1, 40), ... "lactate": rng.normal(2, 0.5, 40), ... }) >>> pv = PValueTable(df, group_col="group") >>> tbl = pv.get_table() >>> list(tbl.columns) ['feature', 'statistic', 'p_value', 'p_adj', 'stars']

Methods

__init__(self, df: pd.DataFrame, group_col: str, value_col: Optional[str]=None, test: str='auto', correct_p: Optional[str]='fdr_bh', p_threshold: float=0.05)

get_table(self)

Return the statistical results as a tidy DataFrame.

Computes the test results lazily on first call and caches them for subsequent calls.

Returns: DataFrame with columns:

- ``feature``: name of the tested column - ``statistic``: test statistic (F, t, or U depending on test) - ``p_value``: raw p-value - ``p_adj``: adjusted p-value (or raw if *correct_p* is ``None``) - ``stars``: significance annotation (``***``, ``**``, ``*``, or ``ns``)

Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import PValueTable >>> rng = np.random.default_rng(42) >>> df = pd.DataFrame({ ... "group": ["A"] * 20 + ["B"] * 20, ... "x": rng.normal(0, 1, 40), ... }) >>> pv = PValueTable(df, group_col="group") >>> tbl = pv.get_table() >>> "p_value" in tbl.columns True

plot(self, fig_height: int=600, fig_width: int=900, font_size: int=12, title: Optional[str]=None)

Render a colour-coded significance table.

Each row corresponds to a tested feature. Cells in the adjusted p-value column are coloured green when the result is significant (``p_adj < p_threshold``) and grey otherwise. The full p-value and star annotation are shown in each row.

Args:

fig_heightFigure height in pixels.

fig_widthFigure width in pixels.

font_sizeFont size for table cells and header.

titleOptional figure title. Defaults to

``"Statistical Test Results"``.

Returns: :class:`plotly.graph_objects.Figure`

Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import PValueTable >>> rng = np.random.default_rng(0) >>> df = pd.DataFrame({ ... "group": ["A"] * 15 + ["B"] * 15, ... "alanine": rng.normal(3, 1, 30), ... "valine": rng.normal(2, 1, 30), ... }) >>> pv = PValueTable(df, group_col="group") >>> fig = pv.plot() >>> isinstance(fig, go.Figure) True

Source

metbit/viz/summary.py at v9.1.0

metbit.viz.summary

Visualization and apps module in metbit 9.1.0.

import metbit.viz.summary

Classes

FeatureHeatmap

Clustered heatmap of feature intensities across samples or groups.

Args:

dfDataFrame with rows=samples and columns=features.

labelGroup labels aligned with the rows of *df*. Used to draw a

colour-bar annotation on top of the heatmap. Accepts a ``pd.Series`` or any list-like of the same length as ``df``.

featuresExplicit subset of column names to include. When ``None``

all columns are used (subject to ``n_features`` in :meth:`plot`).

scalingPre-processing applied to each feature column before

display. One of ``"zscore"`` (zero mean, unit variance), ``"minmax"`` (scale to [0, 1]), or ``"none"`` (raw values).

Raises:

ValueErrorIf *scaling* is not one of the accepted values.

ValueErrorIf *label* length does not match the number of rows in

*df*.

Methods

__init__(self, df: pd.DataFrame, label: Optional[Union[pd.Series, list]]=None, features: Optional[List[str]]=None, scaling: str='zscore')

get_top_features(self, n: int=50)

Return the top *n* features ranked by across-sample variance.

Args:

nNumber of features to return. Capped at the total number of

available features.

Returns: DataFrame with columns ``feature`` and ``variance``, sorted descending by variance.

plot(self, n_features: int=50, cluster_samples: bool=True, cluster_features: bool=True, colorscale: str='RdBu_r', fig_height: int=800, fig_width: int=1000, font_size: int=11, title: Optional[str]=None)

Render the clustered heatmap.

Args:

n_featuresMaximum number of features to display. The top

features by variance are selected automatically.

cluster_samplesWhether to reorder samples (rows) by

hierarchical clustering.

cluster_featuresWhether to reorder features (columns) by

hierarchical clustering.

colorscaleAny Plotly-compatible diverging colorscale name,

e.g. ``"RdBu_r"``, ``"Viridis"``, or ``"RdYlGn"``.

fig_heightFigure height in pixels.

fig_widthFigure width in pixels.

font_sizeBase font size for axis labels and ticks.

titleOptional figure title. Defaults to

``"Feature Heatmap"``.

Returns:

Aclass:`plotly.graph_objects.Figure` ready for display or

export.

CorrelationMatrix

Pairwise feature or sample correlation heatmap.

Computes a Pearson or Spearman correlation matrix and renders it as an interactive Plotly heatmap, with optional hierarchical clustering to group correlated entities together.

Args:

dfDataFrame with rows=samples and columns=features.

methodCorrelation method. One of ``"pearson"`` or

``"spearman"``.

Raises:

ValueErrorIf *method* is not ``"pearson"`` or ``"spearman"``.

Methods

__init__(self, df: pd.DataFrame, method: str='pearson')

get_correlation_matrix(self, n_features: int=30)

Return the raw feature-feature correlation matrix.

Args:

n_featuresNumber of features (selected by highest variance) to

include in the correlation matrix.

Returns: Square DataFrame of shape ``(n_features, n_features)`` containing pairwise correlation coefficients.

plot_features(self, n_features: int=30, cluster: bool=True, colorscale: str='RdBu_r', fig_height: int=700, fig_width: int=750, font_size: int=10, title: Optional[str]=None)

Plot a feature-by-feature correlation heatmap.

The top *n_features* features by variance are selected and their pairwise correlations displayed. The diagonal is masked to grey to avoid visual distraction from the trivial self-correlation of 1.

Args:

n_featuresNumber of features to include.

clusterReorder features by hierarchical clustering when

``True``.

colorscaleDiverging Plotly colorscale for correlation values.

fig_heightFigure height in pixels.

fig_widthFigure width in pixels.

font_sizeBase font size.

titleOptional figure title.

Returns: :class:`plotly.graph_objects.Figure`

plot_samples(self, cluster: bool=True, label: Optional[Union[pd.Series, list]]=None, colorscale: str='RdBu_r', fig_height: int=700, fig_width: int=750, font_size: int=10, title: Optional[str]=None)

Plot a sample-by-sample correlation heatmap.

Args:

clusterReorder samples by hierarchical clustering when

``True``.

labelOptional group labels for samples. When provided, a

colour-coded stripe is rendered above the heatmap.

colorscaleDiverging Plotly colorscale.

fig_heightFigure height in pixels.

fig_widthFigure width in pixels.

font_sizeBase font size.

titleOptional figure title.

Returns: :class:`plotly.graph_objects.Figure`

Raises:

ValueErrorIf *label* length does not match the number of

samples in *df*.

Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import CorrelationMatrix >>> X = pd.DataFrame(np.random.rand(20, 10), ... columns=[f"f{i}" for i in range(10)]) >>> lbl = pd.Series(["A"] * 10 + ["B"] * 10) >>> cm = CorrelationMatrix(X) >>> fig = cm.plot_samples(label=lbl) >>> isinstance(fig, go.Figure) True

PValueTable

Visual table of pairwise statistical test results with significance stars.

Args:

dfTidy DataFrame containing at least *group_col* and one or more

numeric value columns.

group_colName of the column that encodes group membership.

value_colSingle numeric column to test. When ``None`` every

numeric column (excluding *group_col*) is tested.

testStatistical test to apply. ``"auto"`` selects a t-test for

two-group comparisons and one-way ANOVA for three or more.

Explicit choices``"ttest"``, ``"mannwhitney"``,

``"anova"``, ``"kruskal"``.

correct_pMultiple-testing correction method accepted by

:func:`statsmodels.stats.multitest.multipletests`, e.g. ``"fdr_bh"``, ``"bonferroni"``, or ``None`` to skip correction.

p_thresholdSignificance threshold for colouring. Default

``0.05``.

Raises:

ValueErrorIf *group_col* is not present in *df*.

ValueErrorIf *test* is not one of the accepted values.

ValueErrorIf *value_col* is specified but not found in *df*.

Methods

__init__(self, df: pd.DataFrame, group_col: str, value_col: Optional[str]=None, test: str='auto', correct_p: Optional[str]='fdr_bh', p_threshold: float=0.05)

get_table(self)

Return the statistical results as a tidy DataFrame.

Computes the test results lazily on first call and caches them for subsequent calls.

Returns: DataFrame with columns:

plot(self, fig_height: int=600, fig_width: int=900, font_size: int=12, title: Optional[str]=None)

Render a colour-coded significance table.

Args:

fig_heightFigure height in pixels.

fig_widthFigure width in pixels.

font_sizeFont size for table cells and header.

titleOptional figure title. Defaults to

``"Statistical Test Results"``.

Returns: :class:`plotly.graph_objects.Figure`

Examples: >>> import pandas as pd >>> import numpy as np >>> from metbit.viz.summary import PValueTable >>> rng = np.random.default_rng(0) >>> df = pd.DataFrame({ ... "group": ["A"] * 15 + ["B"] * 15, ... "alanine": rng.normal(3, 1, 30), ... "valine": rng.normal(2, 1, 30), ... }) >>> pv = PValueTable(df, group_col="group") >>> fig = pv.plot() >>> isinstance(fig, go.Figure) True