Classes
VolcanoPlot
Volcano plot for two-group differential analysis.
Computes per-feature log2 fold change and p-values (Welch's t-test), optionally applies multiple testing correction, classifies each feature as Up / Down / NS, and renders an interactive Plotly volcano plot.
Parameters
dfpd.DataFrameTidy DataFrame containing group labels and numeric feature columns.
group_colstrColumn name containing group labels. Must have exactly two unique values.
value_colslist of str, optionalSubset of numeric columns to analyse. If ``None``, all numeric columns (excluding ``group_col``) are used.
group_astr, optionalLabel of the reference group ("control"). If ``None``, the lexicographically first unique value in ``group_col`` is used.
group_bstr, optionalLabel of the comparison group ("treatment"). If ``None``, the lexicographically second unique value in ``group_col`` is used.
p_value_thresholdfloat, default=0.05Significance threshold applied to the (corrected) p-value.
fc_thresholdfloat, default=1.0|log2FC| threshold for calling a feature "changed".
correct_pstr or None, default="fdr_bh"Multiple testing correction method passed to ``statsmodels.stats.multitest.multipletests``.
Common values``"fdr_bh"``, ``"bonferroni"``. Pass ``None`` toskip correction.
Examples
>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import VolcanoPlot >>> np.random.seed(42) >>> n = 30 >>> bins = [f"bin_{i:.2f}" for i in np.linspace(0.5, 10.0, 50)] >>> ctrl = pd.DataFrame(np.random.normal(5, 1, (n, 50)), columns=bins) >>> treat = pd.DataFrame(np.random.normal(6, 1, (n, 50)), columns=bins) >>> ctrl["group"] = "Control" >>> treat["group"] = "Treatment" >>> df = pd.concat([ctrl, treat], ignore_index=True) >>> vp = VolcanoPlot(df, group_col="group", correct_p="fdr_bh") >>> fig = vp.plot(title="NMR Metabolomics Volcano Plot") >>> table = vp.get_table() >>> print(table.head())
Methods
__init__(self, df: pd.DataFrame, group_col: str, value_cols: Optional[List[str]]=None, group_a: Optional[str]=None, group_b: Optional[str]=None, p_value_threshold: float=0.05, fc_threshold: float=1.0, correct_p: Optional[str]='fdr_bh')
get_table(self)
Return the per-feature statistical results.
Returns
pd.DataFrame
DataFrame with columns``feature``, ``log2FC``, ``p_value``,``p_adj``, ``neg_log10_p``, ``label``.
Examples
>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import VolcanoPlot >>> np.random.seed(0) >>> bins = [f"bin_{i:.2f}" for i in np.linspace(0.5, 10.0, 20)] >>> ctrl = pd.DataFrame(np.random.normal(5, 1, (20, 20)), columns=bins) >>> treat = pd.DataFrame(np.random.normal(6, 1, (20, 20)), columns=bins) >>> ctrl["group"] = "Control" >>> treat["group"] = "Treatment" >>> df = pd.concat([ctrl, treat], ignore_index=True) >>> vp = VolcanoPlot(df, group_col="group") >>> tbl = vp.get_table() >>> print(tbl.columns.tolist()) ['feature', 'log2FC', 'p_value', 'p_adj', 'neg_log10_p', 'label']
plot(self, title: Optional[str]=None, fig_width: int=900, fig_height: int=700, font_size: int=14, label_top_n: int=10)
Render the volcano plot.
Parameters
titlestr, optionalPlot title. Defaults to a generated title including group names.
fig_widthint, default=900Figure width in pixels.
fig_heightint, default=700Figure height in pixels.
font_sizeint, default=14Base font size for axis labels and tick marks.
label_top_nint, default=10Number of top significant features to label by name.
Returns
go.Figure Interactive Plotly volcano plot.
Examples
>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import VolcanoPlot >>> np.random.seed(1) >>> bins = [f"bin_{i:.2f}" for i in np.linspace(0.5, 10.0, 40)] >>> ctrl = pd.DataFrame(np.random.normal(5, 1, (25, 40)), columns=bins) >>> treat = pd.DataFrame(np.random.normal(5.8, 1, (25, 40)), columns=bins) >>> ctrl["group"] = "Control" >>> treat["group"] = "Treatment" >>> df = pd.concat([ctrl, treat], ignore_index=True) >>> vp = VolcanoPlot(df, group_col="group") >>> fig = vp.plot(title="NMR Differential Analysis") >>> fig.show() # doctest: +SKIP
ANOVAStats
One-way ANOVA with Tukey HSD post-hoc for multi-group comparisons.
Fits a one-way ANOVA across all groups in ``x_col`` for the numeric response ``y_col``, then runs pairwise Tukey HSD comparisons. Results can be visualised as annotated box or violin plots.
Parameters
dfpd.DataFrameTidy DataFrame containing group labels and the response variable.
x_colstrColumn name for the grouping variable (categorical).
y_colstrColumn name for the numeric response variable.
group_orderlist of str, optionalDisplay order of groups. If ``None``, groups are sorted alphabetically.
p_value_thresholdfloat, default=0.05Significance threshold for bracket annotations.
correct_pstr or None, default="fdr_bh"Multiple testing correction applied to Tukey HSD p-values.
NoteTukey HSD already controls FWER; this parameter allowsadditional FDR correction if desired.
fig_heightint, default=600Figure height in pixels.
fig_widthint, default=800Figure width in pixels.
Examples
>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import ANOVAStats >>> np.random.seed(42) >>> n = 20 >>> bins = "bin_3.50" >>> groups = ( ... ["Control"] * n + ["Low_Dose"] * n + ["High_Dose"] * n ... ) >>> values = np.concatenate([ ... np.random.normal(5.0, 0.8, n), ... np.random.normal(5.8, 0.8, n), ... np.random.normal(7.2, 0.8, n), ... ]) >>> df = pd.DataFrame({"group": groups, "intensity": values}) >>> an = ANOVAStats(df, x_col="group", y_col="intensity") >>> an.fit() ANOVAStats(x_col='group', y_col='intensity') >>> fig = an.plot(title="NMR Bin 3.50 ppm") >>> print(an.get_anova_table()) >>> print(an.get_posthoc_table())
Methods
__init__(self, df: pd.DataFrame, x_col: str, y_col: str, group_order: Optional[List[str]]=None, p_value_threshold: float=0.05, correct_p: Optional[str]='fdr_bh', fig_height: int=600, fig_width: int=800)
fit(self)
Run one-way ANOVA and Tukey HSD post-hoc test.
Returns
ANOVAStats Returns ``self`` to allow method chaining.
Examples
>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import ANOVAStats >>> np.random.seed(0) >>> df = pd.DataFrame({ ... "group": ["A"] * 15 + ["B"] * 15 + ["C"] * 15, ... "val": np.concatenate([ ... np.random.normal(4, 1, 15), ... np.random.normal(6, 1, 15), ... np.random.normal(5, 1, 15), ... ]) ... }) >>> an = ANOVAStats(df, x_col="group", y_col="val").fit() >>> print(an.get_anova_table())
get_anova_table(self)
Return overall ANOVA F-statistic and p-value.
Returns
pd.DataFrame Single-row DataFrame with columns: ``F_statistic``, ``p_value``.
Examples
>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import ANOVAStats >>> np.random.seed(7) >>> df = pd.DataFrame({ ... "group": ["A"] * 20 + ["B"] * 20 + ["C"] * 20, ... "val": np.concatenate([ ... np.random.normal(3, 1, 20), ... np.random.normal(5, 1, 20), ... np.random.normal(4, 1, 20), ... ]) ... }) >>> an = ANOVAStats(df, x_col="group", y_col="val").fit() >>> print(an.get_anova_table())
get_posthoc_table(self)
Return pairwise Tukey HSD results.
Returns
pd.DataFrame
DataFrame with columns``group1``, ``group2``, ``meandiff``,``p_adj``, ``reject``.
Examples
>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import ANOVAStats >>> np.random.seed(3) >>> df = pd.DataFrame({ ... "group": ["A"] * 15 + ["B"] * 15 + ["C"] * 15, ... "val": np.concatenate([ ... np.random.normal(2, 1, 15), ... np.random.normal(5, 1, 15), ... np.random.normal(3, 1, 15), ... ]) ... }) >>> an = ANOVAStats(df, x_col="group", y_col="val").fit() >>> print(an.get_posthoc_table())
plot(self, plot_type: str='box', font_size: int=14, title: Optional[str]=None, custom_colors: Optional[Dict[str, str]]=None)
Render an annotated box or violin plot with Tukey significance brackets.
Parameters
plot_typestr, default="box"Either ``"box"`` or ``"violin"``.
font_sizeint, default=14Base font size.
titlestr, optionalPlot title. Defaults to ``y_col``.
custom_colorsdict of str -> str, optionalMapping from group name to hex color string.
Returns
go.Figure Annotated Plotly figure.
Examples
>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import ANOVAStats >>> np.random.seed(5) >>> df = pd.DataFrame({ ... "group": ["A"] * 20 + ["B"] * 20 + ["C"] * 20, ... "intensity": np.concatenate([ ... np.random.normal(4, 0.8, 20), ... np.random.normal(6, 0.8, 20), ... np.random.normal(5, 0.8, 20), ... ]) ... }) >>> fig = ANOVAStats(df, x_col="group", y_col="intensity").fit().plot() >>> fig.show() # doctest: +SKIP
KruskalStats
Kruskal-Wallis test with Dunn post-hoc for non-parametric multi-group comparisons.
A non-parametric alternative to :class:`ANOVAStats`. Uses ``scipy.stats.kruskal`` for the overall test and implements Dunn's test manually (rank-sum z-scores) for pairwise comparisons.
Parameters
dfpd.DataFrameTidy DataFrame containing group labels and the response variable.
x_colstrColumn name for the grouping variable (categorical).
y_colstrColumn name for the numeric response variable.
group_orderlist of str, optionalDisplay order of groups. If ``None``, groups are sorted alphabetically.
p_value_thresholdfloat, default=0.05Significance threshold for bracket annotations.
correct_pstr or None, default="fdr_bh"Multiple testing correction applied to Dunn post-hoc p-values.
Common values``"fdr_bh"``, ``"bonferroni"``. Pass ``None`` to skip.fig_heightint, default=600Figure height in pixels.
fig_widthint, default=800Figure width in pixels.
Examples
>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import KruskalStats >>> np.random.seed(42) >>> n = 20 >>> groups = ["Control"] * n + ["Low_Dose"] * n + ["High_Dose"] * n >>> values = np.concatenate([ ... np.random.exponential(2, n), ... np.random.exponential(4, n), ... np.random.exponential(7, n), ... ]) >>> df = pd.DataFrame({"group": groups, "intensity": values}) >>> kr = KruskalStats(df, x_col="group", y_col="intensity") >>> kr.fit() KruskalStats(x_col='group', y_col='intensity') >>> fig = kr.plot(title="NMR Bin Kruskal-Wallis") >>> print(kr.get_kruskal_table()) >>> print(kr.get_posthoc_table())
Methods
__init__(self, df: pd.DataFrame, x_col: str, y_col: str, group_order: Optional[List[str]]=None, p_value_threshold: float=0.05, correct_p: Optional[str]='fdr_bh', fig_height: int=600, fig_width: int=800)
fit(self)
Run Kruskal-Wallis test and Dunn post-hoc pairwise comparisons.
Dunn's test ranks all observations jointly, then computes a z-score for each pair ``(i, j)``:
.. math::
z = \frac{\bar{R}_i - \bar{R}_j} {\sqrt{\frac{N(N+1)}{12} \left(\frac{1}{n_i} + \frac{1}{n_j}\right)}}
The two-sided p-value follows from the standard normal distribution. Multiple testing correction is applied if ``correct_p`` is set.
Returns
KruskalStats Returns ``self`` to allow method chaining.
Examples
>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import KruskalStats >>> np.random.seed(0) >>> df = pd.DataFrame({ ... "group": ["A"] * 15 + ["B"] * 15 + ["C"] * 15, ... "val": np.concatenate([ ... np.random.exponential(2, 15), ... np.random.exponential(5, 15), ... np.random.exponential(3, 15), ... ]) ... }) >>> kr = KruskalStats(df, x_col="group", y_col="val").fit() >>> print(kr.get_kruskal_table())
get_kruskal_table(self)
Return overall Kruskal-Wallis H-statistic and p-value.
Returns
pd.DataFrame Single-row DataFrame with columns: ``H_statistic``, ``p_value``.
Examples
>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import KruskalStats >>> np.random.seed(9) >>> df = pd.DataFrame({ ... "group": ["A"] * 20 + ["B"] * 20 + ["C"] * 20, ... "val": np.concatenate([ ... np.random.exponential(1, 20), ... np.random.exponential(3, 20), ... np.random.exponential(2, 20), ... ]) ... }) >>> kr = KruskalStats(df, x_col="group", y_col="val").fit() >>> print(kr.get_kruskal_table())
get_posthoc_table(self)
Return pairwise Dunn post-hoc results.
Returns
pd.DataFrame
DataFrame with columns``group1``, ``group2``, ``z_score``,``p_value``, ``p_adj``, ``reject``.
Examples
>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import KruskalStats >>> np.random.seed(11) >>> df = pd.DataFrame({ ... "group": ["A"] * 15 + ["B"] * 15 + ["C"] * 15, ... "val": np.concatenate([ ... np.random.exponential(1, 15), ... np.random.exponential(4, 15), ... np.random.exponential(2, 15), ... ]) ... }) >>> kr = KruskalStats(df, x_col="group", y_col="val").fit() >>> print(kr.get_posthoc_table())
plot(self, plot_type: str='box', font_size: int=14, title: Optional[str]=None, custom_colors: Optional[Dict[str, str]]=None)
Render an annotated box or violin plot with Dunn significance brackets.
Parameters
plot_typestr, default="box"Either ``"box"`` or ``"violin"``.
font_sizeint, default=14Base font size.
titlestr, optionalPlot title. Defaults to ``y_col``.
custom_colorsdict of str -> str, optionalMapping from group name to hex color string.
Returns
go.Figure Annotated Plotly figure.
Examples
>>> import numpy as np >>> import pandas as pd >>> from metbit.stats.multitest import KruskalStats >>> np.random.seed(6) >>> df = pd.DataFrame({ ... "group": ["A"] * 20 + ["B"] * 20 + ["C"] * 20, ... "intensity": np.concatenate([ ... np.random.exponential(2, 20), ... np.random.exponential(6, 20), ... np.random.exponential(4, 20), ... ]) ... }) >>> fig = KruskalStats(df, x_col="group", y_col="intensity").fit().plot() >>> fig.show() # doctest: +SKIP