API reference / Statistics and utilities

You are viewing the documentation for metbit 8.1.0. Change release context

metbit.utility

Statistics and utilities module in metbit 8.1.0.

import metbit.utility

Classes

lazypair

Methods

__init__(self, dataset, column_name)

get_index(self)

get_name(self)

get_meta(self)

get_column_name(self)

get_dataset(self)

gen_page

Methods

__init__(self, data_path)

This function takes in the path to the data folder and returns the HTML files for the OPLS-DA plots.

Parameters

data_pathstr

The path to the data folder. gen_page(data_path).get_files()

get_files(self)

oplsda_path

Methods

__init__(self, data_path)

make_path(self)

get_path(self)

Normality_distribution

Methods

__init__(self, data: pd.DataFrame)

plot_distribution(self, feature)

pca_distributions(self)

Normalise

Methods

__init__(self, data: pd.DataFrame, compute_missing: bool=True)

pqn_normalise(self, ref_index: list=None, plot: bool=True)

decimal_place_normalisation(self, decimals: int=2)

This function returns the dataframe with values rounded to a specified number of decimal places.

Parameters

decimalsint

The number of decimal places to round to.

z_score_normalisation(self)

This function returns the dataframe normalized using Z-Score.

linear_normalisation(self)

This function returns the dataframe normalized using Min-Max (linear normalization).

normalize_to_100(self)

This function returns the dataframe with values normalized to 100.

clipping_normalisation(self, lower: float, upper: float)

This function returns the dataframe with values clipped to the specified range.

Parameters

lowerfloat

The lower bound for clipping.

upperfloat

The upper bound for clipping.

standard_deviation_normalisation(self)

This function returns the dataframe normalized using Standard Deviation.

univar_stats

A class for generating univariate box or violin plots with statistical annotations using Plotly. Supports group-wise comparisons using t-tests, ANOVA, or nonparametric tests, along with effect size calculation and multiple testing correction.

Parameters

dfpandas.DataFrame

The input dataframe containing numeric and grouping columns.

x_colstr

Column name used for group/category (x-axis).

y_colstr

Column name used for values (y-axis).

group_orderlist of str, optional

Custom ordering of groups on the x-axis. Defaults to the order in the dataframe.

custom_colorsdict, optional

Dictionary mapping group names to Plotly color codes.

stats_optionslist of str, optional

Statistical tests to perform. Choices: - 't-test' : independent two-sample t-test - 'anova' : one-way ANOVA - 'nonparametric' : Mann-Whitney U test - 'effect-size' : Computes Cohen's d

p_value_thresholdfloat, default=0.05

Threshold for marking comparisons as significant.

annotate_stylestr, default="value"

How to display p-values. Options: - 'value' : show exact p-values (e.g., p=0.0031) - 'symbol' : show significance level (*, **, ***) or 'ns'

y_offset_factorfloat, default=0.35

Controls spacing between stacked annotation lines (relative to y-axis range).

show_non_significantbool, default=True

If False, non-significant comparisons are hidden from the plot.

correct_pstr or None, default="bonferroni"

Method for multiple testing correction (e.g., "bonferroni", "fdr_bh", or None).

title_str, optional

Title for the plot. Defaults to y_col.

y_labelstr, optional

Custom label for the y-axis. Defaults to y_col.

x_labelstr, optional

Custom label for the x-axis. Defaults to x_col.

fig_heightint, default=800

Height of the figure in pixels.

fig_widthint, default=600

Width of the figure in pixels.

plot_typestr, default="box"

Type of plot. Choices: - "box" - "violin"

show_axis_linesbool, default=True

Whether to show border lines on axes.

Attributes

dfpandas.DataFrame

The input data.

plot() : plotly.graph_objects.Figure Generates the interactive annotated plot.

Examples

>>> import pandas as pd >>> import numpy as np >>> from univar_stats import univar_stats

>>> # Create mock data >>> df = pd.DataFrame({ ... "group": np.repeat(["A", "B", "C"], 30), ... "value": np.concatenate([ ... np.random.normal(5, 1, 30), ... np.random.normal(6, 1, 30), ... np.random.normal(7, 1, 30) ... ]) ... })

>>> # Initialize and plot >>> plotter = univar_stats( ... df, x_col="group", y_col="value", ... stats_options=["t-test", "effect-size"], ... annotate_style="symbol", plot_type="box", ... show_non_significant=False ... ) >>> fig = plotter.plot() >>> fig.show()

Methods

__init__(self, df: pd.DataFrame, x_col: str, y_col: str, group_order: Optional[List[str]]=None, custom_colors: Optional[Dict[str, str]]=None, stats_options: Optional[List[str]]=None, p_value_threshold: float=0.05, annotate_style: str='value', y_offset_factor: float=0.35, show_non_significant: bool=True, correct_p: Optional[str]='bonferroni', title_: Optional[str]=None, y_label: Optional[str]=None, x_label: Optional[str]=None, fig_height: int=800, fig_width: int=600, plot_type: str='box', show_axis_lines: bool=True)

compute_effsize(a, b, eftype: str='cohen')

Compute effect size (Cohen's d).

plot(self)

Functions

project_name_generator()

Source

metbit/utility.py at v8.1.0