API reference / Statistics and utilities

You are viewing the documentation for metbit 9.1.0. Change release context

metbit.stats.multitest

Statistics and utilities module in metbit 9.1.0.

import metbit.stats.multitest

Classes

VolcanoPlot

Volcano plot for two-group differential analysis.

Computes per-feature log2 fold change and p-values (Welch's t-test), optionally applies multiple testing correction, classifies each feature as Up / Down / NS, and renders an interactive Plotly volcano plot.

Parameters

dfpd.DataFrame

Tidy DataFrame containing group labels and numeric feature columns.

group_colstr

Column name containing group labels. Must have exactly two unique values.

value_colslist of str, optional

Subset of numeric columns to analyse. If ``None``, all numeric columns (excluding ``group_col``) are used.

group_astr, optional

Label of the reference group ("control"). If ``None``, the lexicographically first unique value in ``group_col`` is used.

group_bstr, optional

Label of the comparison group ("treatment"). If ``None``, the lexicographically second unique value in ``group_col`` is used.

p_value_thresholdfloat, default=0.05

Significance threshold applied to the (corrected) p-value.

fc_thresholdfloat, default=1.0

|log2FC| threshold for calling a feature "changed".

correct_pstr or None, default="fdr_bh"

Multiple testing correction method passed to ``statsmodels.stats.multitest.multipletests``.

Common values``"fdr_bh"``, ``"bonferroni"``. Pass ``None`` to

skip correction.

Examples

>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import VolcanoPlot >>> np.random.seed(42) >>> n = 30 >>> bins = [f"bin_{i:.2f}" for i in np.linspace(0.5, 10.0, 50)] >>> ctrl = pd.DataFrame(np.random.normal(5, 1, (n, 50)), columns=bins) >>> treat = pd.DataFrame(np.random.normal(6, 1, (n, 50)), columns=bins) >>> ctrl["group"] = "Control" >>> treat["group"] = "Treatment" >>> df = pd.concat([ctrl, treat], ignore_index=True) >>> vp = VolcanoPlot(df, group_col="group", correct_p="fdr_bh") >>> fig = vp.plot(title="NMR Metabolomics Volcano Plot") >>> table = vp.get_table() >>> print(table.head())

Methods

__init__(self, df: pd.DataFrame, group_col: str, value_cols: Optional[List[str]]=None, group_a: Optional[str]=None, group_b: Optional[str]=None, p_value_threshold: float=0.05, fc_threshold: float=1.0, correct_p: Optional[str]='fdr_bh')

get_table(self)

Return the per-feature statistical results.

Returns

pd.DataFrame

DataFrame with columns``feature``, ``log2FC``, ``p_value``,

``p_adj``, ``neg_log10_p``, ``label``.

Examples

>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import VolcanoPlot >>> np.random.seed(0) >>> bins = [f"bin_{i:.2f}" for i in np.linspace(0.5, 10.0, 20)] >>> ctrl = pd.DataFrame(np.random.normal(5, 1, (20, 20)), columns=bins) >>> treat = pd.DataFrame(np.random.normal(6, 1, (20, 20)), columns=bins) >>> ctrl["group"] = "Control" >>> treat["group"] = "Treatment" >>> df = pd.concat([ctrl, treat], ignore_index=True) >>> vp = VolcanoPlot(df, group_col="group") >>> tbl = vp.get_table() >>> print(tbl.columns.tolist()) ['feature', 'log2FC', 'p_value', 'p_adj', 'neg_log10_p', 'label']

plot(self, title: Optional[str]=None, fig_width: int=900, fig_height: int=700, font_size: int=14, label_top_n: int=10)

Render the volcano plot.

Parameters

titlestr, optional

Plot title. Defaults to a generated title including group names.

fig_widthint, default=900

Figure width in pixels.

fig_heightint, default=700

Figure height in pixels.

font_sizeint, default=14

Base font size for axis labels and tick marks.

label_top_nint, default=10

Number of top significant features to label by name.

Returns

go.Figure Interactive Plotly volcano plot.

Examples

>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import VolcanoPlot >>> np.random.seed(1) >>> bins = [f"bin_{i:.2f}" for i in np.linspace(0.5, 10.0, 40)] >>> ctrl = pd.DataFrame(np.random.normal(5, 1, (25, 40)), columns=bins) >>> treat = pd.DataFrame(np.random.normal(5.8, 1, (25, 40)), columns=bins) >>> ctrl["group"] = "Control" >>> treat["group"] = "Treatment" >>> df = pd.concat([ctrl, treat], ignore_index=True) >>> vp = VolcanoPlot(df, group_col="group") >>> fig = vp.plot(title="NMR Differential Analysis") >>> fig.show() # doctest: +SKIP

ANOVAStats

One-way ANOVA with Tukey HSD post-hoc for multi-group comparisons.

Fits a one-way ANOVA across all groups in ``x_col`` for the numeric response ``y_col``, then runs pairwise Tukey HSD comparisons. Results can be visualised as annotated box or violin plots.

Parameters

dfpd.DataFrame

Tidy DataFrame containing group labels and the response variable.

x_colstr

Column name for the grouping variable (categorical).

y_colstr

Column name for the numeric response variable.

group_orderlist of str, optional

Display order of groups. If ``None``, groups are sorted alphabetically.

p_value_thresholdfloat, default=0.05

Significance threshold for bracket annotations.

correct_pstr or None, default="fdr_bh"

Multiple testing correction applied to Tukey HSD p-values.

NoteTukey HSD already controls FWER; this parameter allows

additional FDR correction if desired.

fig_heightint, default=600

Figure height in pixels.

fig_widthint, default=800

Figure width in pixels.

Examples

>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import ANOVAStats >>> np.random.seed(42) >>> n = 20 >>> bins = "bin_3.50" >>> groups = ( ... ["Control"] * n + ["Low_Dose"] * n + ["High_Dose"] * n ... ) >>> values = np.concatenate([ ... np.random.normal(5.0, 0.8, n), ... np.random.normal(5.8, 0.8, n), ... np.random.normal(7.2, 0.8, n), ... ]) >>> df = pd.DataFrame({"group": groups, "intensity": values}) >>> an = ANOVAStats(df, x_col="group", y_col="intensity") >>> an.fit() ANOVAStats(x_col='group', y_col='intensity') >>> fig = an.plot(title="NMR Bin 3.50 ppm") >>> print(an.get_anova_table()) >>> print(an.get_posthoc_table())

Methods

__init__(self, df: pd.DataFrame, x_col: str, y_col: str, group_order: Optional[List[str]]=None, p_value_threshold: float=0.05, correct_p: Optional[str]='fdr_bh', fig_height: int=600, fig_width: int=800)

fit(self)

Run one-way ANOVA and Tukey HSD post-hoc test.

Returns

ANOVAStats Returns ``self`` to allow method chaining.

Examples

>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import ANOVAStats >>> np.random.seed(0) >>> df = pd.DataFrame({ ... "group": ["A"] * 15 + ["B"] * 15 + ["C"] * 15, ... "val": np.concatenate([ ... np.random.normal(4, 1, 15), ... np.random.normal(6, 1, 15), ... np.random.normal(5, 1, 15), ... ]) ... }) >>> an = ANOVAStats(df, x_col="group", y_col="val").fit() >>> print(an.get_anova_table())

get_anova_table(self)

Return overall ANOVA F-statistic and p-value.

Returns

pd.DataFrame Single-row DataFrame with columns: ``F_statistic``, ``p_value``.

Examples

>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import ANOVAStats >>> np.random.seed(7) >>> df = pd.DataFrame({ ... "group": ["A"] * 20 + ["B"] * 20 + ["C"] * 20, ... "val": np.concatenate([ ... np.random.normal(3, 1, 20), ... np.random.normal(5, 1, 20), ... np.random.normal(4, 1, 20), ... ]) ... }) >>> an = ANOVAStats(df, x_col="group", y_col="val").fit() >>> print(an.get_anova_table())

get_posthoc_table(self)

Return pairwise Tukey HSD results.

Returns

pd.DataFrame

DataFrame with columns``group1``, ``group2``, ``meandiff``,

``p_adj``, ``reject``.

Examples

>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import ANOVAStats >>> np.random.seed(3) >>> df = pd.DataFrame({ ... "group": ["A"] * 15 + ["B"] * 15 + ["C"] * 15, ... "val": np.concatenate([ ... np.random.normal(2, 1, 15), ... np.random.normal(5, 1, 15), ... np.random.normal(3, 1, 15), ... ]) ... }) >>> an = ANOVAStats(df, x_col="group", y_col="val").fit() >>> print(an.get_posthoc_table())

plot(self, plot_type: str='box', font_size: int=14, title: Optional[str]=None, custom_colors: Optional[Dict[str, str]]=None)

Render an annotated box or violin plot with Tukey significance brackets.

Parameters

plot_typestr, default="box"

Either ``"box"`` or ``"violin"``.

font_sizeint, default=14

Base font size.

titlestr, optional

Plot title. Defaults to ``y_col``.

custom_colorsdict of str -> str, optional

Mapping from group name to hex color string.

Returns

go.Figure Annotated Plotly figure.

Examples

>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import ANOVAStats >>> np.random.seed(5) >>> df = pd.DataFrame({ ... "group": ["A"] * 20 + ["B"] * 20 + ["C"] * 20, ... "intensity": np.concatenate([ ... np.random.normal(4, 0.8, 20), ... np.random.normal(6, 0.8, 20), ... np.random.normal(5, 0.8, 20), ... ]) ... }) >>> fig = ANOVAStats(df, x_col="group", y_col="intensity").fit().plot() >>> fig.show() # doctest: +SKIP

KruskalStats

Kruskal-Wallis test with Dunn post-hoc for non-parametric multi-group comparisons.

A non-parametric alternative to :class:`ANOVAStats`. Uses ``scipy.stats.kruskal`` for the overall test and implements Dunn's test manually (rank-sum z-scores) for pairwise comparisons.

Parameters

dfpd.DataFrame

Tidy DataFrame containing group labels and the response variable.

x_colstr

Column name for the grouping variable (categorical).

y_colstr

Column name for the numeric response variable.

group_orderlist of str, optional

Display order of groups. If ``None``, groups are sorted alphabetically.

p_value_thresholdfloat, default=0.05

Significance threshold for bracket annotations.

correct_pstr or None, default="fdr_bh"

Multiple testing correction applied to Dunn post-hoc p-values.

Common values``"fdr_bh"``, ``"bonferroni"``. Pass ``None`` to skip.

fig_heightint, default=600

Figure height in pixels.

fig_widthint, default=800

Figure width in pixels.

Examples

>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import KruskalStats >>> np.random.seed(42) >>> n = 20 >>> groups = ["Control"] * n + ["Low_Dose"] * n + ["High_Dose"] * n >>> values = np.concatenate([ ... np.random.exponential(2, n), ... np.random.exponential(4, n), ... np.random.exponential(7, n), ... ]) >>> df = pd.DataFrame({"group": groups, "intensity": values}) >>> kr = KruskalStats(df, x_col="group", y_col="intensity") >>> kr.fit() KruskalStats(x_col='group', y_col='intensity') >>> fig = kr.plot(title="NMR Bin Kruskal-Wallis") >>> print(kr.get_kruskal_table()) >>> print(kr.get_posthoc_table())

Methods

__init__(self, df: pd.DataFrame, x_col: str, y_col: str, group_order: Optional[List[str]]=None, p_value_threshold: float=0.05, correct_p: Optional[str]='fdr_bh', fig_height: int=600, fig_width: int=800)

fit(self)

Run Kruskal-Wallis test and Dunn post-hoc pairwise comparisons.

Dunn's test ranks all observations jointly, then computes a z-score for each pair ``(i, j)``:

.. math::

z = \frac{\bar{R}_i - \bar{R}_j} {\sqrt{\frac{N(N+1)}{12} \left(\frac{1}{n_i} + \frac{1}{n_j}\right)}}

The two-sided p-value follows from the standard normal distribution. Multiple testing correction is applied if ``correct_p`` is set.

Returns

KruskalStats Returns ``self`` to allow method chaining.

Examples

>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import KruskalStats >>> np.random.seed(0) >>> df = pd.DataFrame({ ... "group": ["A"] * 15 + ["B"] * 15 + ["C"] * 15, ... "val": np.concatenate([ ... np.random.exponential(2, 15), ... np.random.exponential(5, 15), ... np.random.exponential(3, 15), ... ]) ... }) >>> kr = KruskalStats(df, x_col="group", y_col="val").fit() >>> print(kr.get_kruskal_table())

get_kruskal_table(self)

Return overall Kruskal-Wallis H-statistic and p-value.

Returns

pd.DataFrame Single-row DataFrame with columns: ``H_statistic``, ``p_value``.

Examples

>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import KruskalStats >>> np.random.seed(9) >>> df = pd.DataFrame({ ... "group": ["A"] * 20 + ["B"] * 20 + ["C"] * 20, ... "val": np.concatenate([ ... np.random.exponential(1, 20), ... np.random.exponential(3, 20), ... np.random.exponential(2, 20), ... ]) ... }) >>> kr = KruskalStats(df, x_col="group", y_col="val").fit() >>> print(kr.get_kruskal_table())

get_posthoc_table(self)

Return pairwise Dunn post-hoc results.

Returns

pd.DataFrame

DataFrame with columns``group1``, ``group2``, ``z_score``,

``p_value``, ``p_adj``, ``reject``.

Examples

>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import KruskalStats >>> np.random.seed(11) >>> df = pd.DataFrame({ ... "group": ["A"] * 15 + ["B"] * 15 + ["C"] * 15, ... "val": np.concatenate([ ... np.random.exponential(1, 15), ... np.random.exponential(4, 15), ... np.random.exponential(2, 15), ... ]) ... }) >>> kr = KruskalStats(df, x_col="group", y_col="val").fit() >>> print(kr.get_posthoc_table())

plot(self, plot_type: str='box', font_size: int=14, title: Optional[str]=None, custom_colors: Optional[Dict[str, str]]=None)

Render an annotated box or violin plot with Dunn significance brackets.

Parameters

plot_typestr, default="box"

Either ``"box"`` or ``"violin"``.

font_sizeint, default=14

Base font size.

titlestr, optional

Plot title. Defaults to ``y_col``.

custom_colorsdict of str -> str, optional

Mapping from group name to hex color string.

Returns

go.Figure Annotated Plotly figure.

Examples

>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import KruskalStats >>> np.random.seed(6) >>> df = pd.DataFrame({ ... "group": ["A"] * 20 + ["B"] * 20 + ["C"] * 20, ... "intensity": np.concatenate([ ... np.random.exponential(2, 20), ... np.random.exponential(6, 20), ... np.random.exponential(4, 20), ... ]) ... }) >>> fig = KruskalStats(df, x_col="group", y_col="intensity").fit().plot() >>> fig.show() # doctest: +SKIP

Source

metbit/stats/multitest.py at v9.1.0

metbit.stats.multitest

Statistics and utilities module in metbit 9.1.0.

import metbit.stats.multitest

Classes

VolcanoPlot

Volcano plot for two-group differential analysis.

Parameters

dfpd.DataFrame

Tidy DataFrame containing group labels and numeric feature columns.

group_colstr

Column name containing group labels. Must have exactly two unique values.

value_colslist of str, optional

Subset of numeric columns to analyse. If ``None``, all numeric columns (excluding ``group_col``) are used.

group_astr, optional

Label of the reference group ("control"). If ``None``, the lexicographically first unique value in ``group_col`` is used.

group_bstr, optional

Label of the comparison group ("treatment"). If ``None``, the lexicographically second unique value in ``group_col`` is used.

p_value_thresholdfloat, default=0.05

Significance threshold applied to the (corrected) p-value.

fc_thresholdfloat, default=1.0

|log2FC| threshold for calling a feature "changed".

correct_pstr or None, default="fdr_bh"

Multiple testing correction method passed to ``statsmodels.stats.multitest.multipletests``.

Common values``"fdr_bh"``, ``"bonferroni"``. Pass ``None`` to

skip correction.

Examples

Methods

__init__(self, df: pd.DataFrame, group_col: str, value_cols: Optional[List[str]]=None, group_a: Optional[str]=None, group_b: Optional[str]=None, p_value_threshold: float=0.05, fc_threshold: float=1.0, correct_p: Optional[str]='fdr_bh')

get_table(self)

Return the per-feature statistical results.

Returns

pd.DataFrame

DataFrame with columns``feature``, ``log2FC``, ``p_value``,

``p_adj``, ``neg_log10_p``, ``label``.

Examples

plot(self, title: Optional[str]=None, fig_width: int=900, fig_height: int=700, font_size: int=14, label_top_n: int=10)

Render the volcano plot.

Parameters

titlestr, optional

Plot title. Defaults to a generated title including group names.

fig_widthint, default=900

Figure width in pixels.

fig_heightint, default=700

Figure height in pixels.

font_sizeint, default=14

Base font size for axis labels and tick marks.

label_top_nint, default=10

Number of top significant features to label by name.

Returns

go.Figure Interactive Plotly volcano plot.

Examples

ANOVAStats

One-way ANOVA with Tukey HSD post-hoc for multi-group comparisons.

Fits a one-way ANOVA across all groups in ``x_col`` for the numeric response ``y_col``, then runs pairwise Tukey HSD comparisons. Results can be visualised as annotated box or violin plots.

Parameters

dfpd.DataFrame

Tidy DataFrame containing group labels and the response variable.

x_colstr

Column name for the grouping variable (categorical).

y_colstr

Column name for the numeric response variable.

group_orderlist of str, optional

Display order of groups. If ``None``, groups are sorted alphabetically.

p_value_thresholdfloat, default=0.05

Significance threshold for bracket annotations.

correct_pstr or None, default="fdr_bh"

Multiple testing correction applied to Tukey HSD p-values.

NoteTukey HSD already controls FWER; this parameter allows

additional FDR correction if desired.

fig_heightint, default=600

Figure height in pixels.

fig_widthint, default=800

Figure width in pixels.

Examples

Methods

__init__(self, df: pd.DataFrame, x_col: str, y_col: str, group_order: Optional[List[str]]=None, p_value_threshold: float=0.05, correct_p: Optional[str]='fdr_bh', fig_height: int=600, fig_width: int=800)

fit(self)

Run one-way ANOVA and Tukey HSD post-hoc test.

Returns

ANOVAStats Returns ``self`` to allow method chaining.

Examples

get_anova_table(self)

Return overall ANOVA F-statistic and p-value.

Returns

pd.DataFrame Single-row DataFrame with columns: ``F_statistic``, ``p_value``.

Examples

get_posthoc_table(self)

Return pairwise Tukey HSD results.

Returns

pd.DataFrame

DataFrame with columns``group1``, ``group2``, ``meandiff``,

``p_adj``, ``reject``.

Examples

plot(self, plot_type: str='box', font_size: int=14, title: Optional[str]=None, custom_colors: Optional[Dict[str, str]]=None)

Render an annotated box or violin plot with Tukey significance brackets.

Parameters

plot_typestr, default="box"

Either ``"box"`` or ``"violin"``.

font_sizeint, default=14

Base font size.

titlestr, optional

Plot title. Defaults to ``y_col``.

custom_colorsdict of str -> str, optional

Mapping from group name to hex color string.

Returns

go.Figure Annotated Plotly figure.

Examples

KruskalStats

Kruskal-Wallis test with Dunn post-hoc for non-parametric multi-group comparisons.

A non-parametric alternative to :class:`ANOVAStats`. Uses ``scipy.stats.kruskal`` for the overall test and implements Dunn's test manually (rank-sum z-scores) for pairwise comparisons.

Parameters

dfpd.DataFrame

Tidy DataFrame containing group labels and the response variable.

x_colstr

Column name for the grouping variable (categorical).

y_colstr

Column name for the numeric response variable.

group_orderlist of str, optional

Display order of groups. If ``None``, groups are sorted alphabetically.

p_value_thresholdfloat, default=0.05

Significance threshold for bracket annotations.

correct_pstr or None, default="fdr_bh"

Multiple testing correction applied to Dunn post-hoc p-values.

Common values``"fdr_bh"``, ``"bonferroni"``. Pass ``None`` to skip.

fig_heightint, default=600

Figure height in pixels.

fig_widthint, default=800

Figure width in pixels.

Examples

Methods

__init__(self, df: pd.DataFrame, x_col: str, y_col: str, group_order: Optional[List[str]]=None, p_value_threshold: float=0.05, correct_p: Optional[str]='fdr_bh', fig_height: int=600, fig_width: int=800)

fit(self)

Run Kruskal-Wallis test and Dunn post-hoc pairwise comparisons.

Dunn's test ranks all observations jointly, then computes a z-score for each pair ``(i, j)``:

.. math::

z = \frac{\bar{R}_i - \bar{R}_j} {\sqrt{\frac{N(N+1)}{12} \left(\frac{1}{n_i} + \frac{1}{n_j}\right)}}

The two-sided p-value follows from the standard normal distribution. Multiple testing correction is applied if ``correct_p`` is set.

Returns

KruskalStats Returns ``self`` to allow method chaining.

Examples

get_kruskal_table(self)

Return overall Kruskal-Wallis H-statistic and p-value.

Returns

pd.DataFrame Single-row DataFrame with columns: ``H_statistic``, ``p_value``.

Examples

get_posthoc_table(self)

Return pairwise Dunn post-hoc results.

Returns

pd.DataFrame

DataFrame with columns``group1``, ``group2``, ``z_score``,

``p_value``, ``p_adj``, ``reject``.

Examples

plot(self, plot_type: str='box', font_size: int=14, title: Optional[str]=None, custom_colors: Optional[Dict[str, str]]=None)

Render an annotated box or violin plot with Dunn significance brackets.

Parameters

plot_typestr, default="box"

Either ``"box"`` or ``"violin"``.

font_sizeint, default=14

Base font size.

titlestr, optional

Plot title. Defaults to ``y_col``.

custom_colorsdict of str -> str, optional

Mapping from group name to hex color string.

Returns

go.Figure Annotated Plotly figure.