API reference / Statistics and utilities

You are viewing the documentation for metbit 6.6.7. Change release context

metbit.utility

Statistics and utilities module in metbit 6.6.7.

import metbit.utility

Classes

lazypair

Methods

__init__(self, dataset, column_name)

get_index(self)

get_name(self)

get_meta(self)

get_column_name(self)

get_dataset(self)

gen_page

Methods

__init__(self, data_path)

This function takes in the path to the data folder and returns the HTML files for the OPLS-DA plots.

Parameters

data_pathstr

The path to the data folder. gen_page(data_path).get_files()

get_files(self)

oplsda_path

Methods

__init__(self, data_path)

make_path(self)

get_path(self)

Normality_distribution

Methods

__init__(self, data: pd.DataFrame)

plot_distribution(self, feature)

pca_distributions(self)

Normalise

Methods

__init__(self, data: pd.DataFrame, compute_missing: bool=True)

This function takes in a dataframe and returns the normalised dataframe.

Parameters

datapandas dataframe

The dataframe to be used. Normalise(data).normalise()

pqn_normalise(self, plot: bool=True)

decimal_place_normalisation(self, decimals: int=2)

This function returns the dataframe with values rounded to a specified number of decimal places.

Parameters

decimalsint

The number of decimal places to round to.

z_score_normalisation(self)

This function returns the dataframe normalized using Z-Score.

linear_normalisation(self)

This function returns the dataframe normalized using Min-Max (linear normalization).

normalize_to_100(self)

This function returns the dataframe with values normalized to 100.

clipping_normalisation(self, lower: float, upper: float)

This function returns the dataframe with values clipped to the specified range.

Parameters

lowerfloat

The lower bound for clipping.

upperfloat

The upper bound for clipping.

standard_deviation_normalisation(self)

This function returns the dataframe normalized using Standard Deviation.

Functions

project_name_generator()

boxplot_stats(df, x_col, y_col, group_order=None, custom_colors=None, stats_options=None, p_value_threshold=0.05, annotate_style='value', y_offset_factor=0.05, show_non_significant=True, correct_p='bonferroni', title_=None, y_label=None, x_label=None, fig_height=800, fig_width=600)

Enhanced box plot function with customizable statistical analysis and annotation.

Parameters:

dfpandas.DataFrame

The input DataFrame containing the data for the plot.

x_colstr

The name of the column representing the categorical variable (e.g., treatment groups).

y_colstr

The name of the column representing the numerical variable (e.g., scores).

group_orderlist, optional

Custom order of groups for the x-axis. Defaults to the natural group order in the data.

custom_colorsdict, optional

A dictionary mapping group names to specific colors (e.g., {"A": "red", "B": "blue"}).

stats_optionslist of str, optional

Statistical tests and calculations to perform. Options: - "t-test": Perform pairwise Student's t-tests between groups. - "nonparametric": Use Mann-Whitney U test for pairwise comparisons. - "anova": Perform a one-way ANOVA (requires more than two groups). - "effect-size": Calculate Cohen's d for pairwise comparisons (not supported for ANOVA). Defaults to ["t-test"].

p_value_thresholdfloat, optional

Threshold for considering p-values as significant. Default is 0.05.

annotate_stylestr, optional

Style for annotations. Options: - "value": Show exact p-values (e.g., "p=0.0123"). - "symbol": Use significance symbols (e.g., "***", "**", "*", or "ns" for not significant). Default is "value".

figure_sizetuple, optional

Tuple specifying the width and height of the plot (in pixels). Default is (800, 600).

y_offset_factorfloat, optional

Proportion of the y-axis range to use for spacing annotations. Default is 0.05.

show_non_significantbool, optional

Whether to display annotations for non-significant comparisons. Default is True.

correct_pstr, optional

Method for correcting p-values for multiple comparisons. Options include: - "bonferroni" - "holm" - "fdr_bh" (Benjamini-Hochberg) - None (no correction) Default is "bonferroni".

title_str, optional

Title of the plot. Defaults to the name of the y_col column.

y_labelstr, optional

Label for the y-axis. Defaults to the name of the y_col column.

x_labelstr, optional

Label for the x-axis. Defaults to the name of the x_col column.

fig_heightint, optional

Height of the figure in pixels. Default is 800.

fig_widthint, optional

Width of the figure in pixels. Default is 600.

Returns:

plotly.graph_objects.Figure A Plotly Figure object containing the enhanced box plot with statistical annotations.

Examples:

Example 1Basic box plot with t-tests and Bonferroni correction:

data = { "treatment": ["A"] * 10 + ["B"] * 10, "score": [0.5, 0.6, 0.7, 0.8, 0.9, 0.7, 0.8, 0.9, 0.6, 0.5, 0.4, 0.5, 0.6, 0.7, 0.8, 0.6, 0.7, 0.8, 0.5, 0.4], } df = pd.DataFrame(data) fig = boxplot_stats( df, x_col="treatment", y_col="score", stats_options=["t-test"], correct_p="bonferroni", p_value_threshold=0.05 ) fig.show()

Example 2Advanced plot with custom colors, ANOVA, and effect sizes:

data = { "treatment": ["A"] * 10 + ["B"] * 10 + ["C"] * 10, "score": np.random.rand(30), } df = pd.DataFrame(data) custom_colors = {"A": "red", "B": "blue", "C": "green"} fig = boxplot_stats( df, x_col="treatment", y_col="score", stats_options=["anova", "effect-size"], custom_colors=custom_colors ) fig.show()

Source

metbit/utility.py at v6.6.7