metbit.utility
Statistics and utilities module in metbit 6.6.7.
import metbit.utilityClasses
lazypair
Methods
__init__(self, dataset, column_name)
get_index(self)
get_name(self)
get_meta(self)
get_column_name(self)
get_dataset(self)
gen_page
Methods
__init__(self, data_path)
This function takes in the path to the data folder and returns the HTML files for the OPLS-DA plots.
Parameters
data_pathstrThe path to the data folder. gen_page(data_path).get_files()
get_files(self)
oplsda_path
Methods
__init__(self, data_path)
make_path(self)
get_path(self)
Normality_distribution
Methods
__init__(self, data: pd.DataFrame)
plot_distribution(self, feature)
pca_distributions(self)
Normalise
Methods
__init__(self, data: pd.DataFrame, compute_missing: bool=True)
This function takes in a dataframe and returns the normalised dataframe.
Parameters
datapandas dataframeThe dataframe to be used. Normalise(data).normalise()
pqn_normalise(self, plot: bool=True)
decimal_place_normalisation(self, decimals: int=2)
This function returns the dataframe with values rounded to a specified number of decimal places.
Parameters
decimalsintThe number of decimal places to round to.
z_score_normalisation(self)
This function returns the dataframe normalized using Z-Score.
linear_normalisation(self)
This function returns the dataframe normalized using Min-Max (linear normalization).
normalize_to_100(self)
This function returns the dataframe with values normalized to 100.
clipping_normalisation(self, lower: float, upper: float)
This function returns the dataframe with values clipped to the specified range.
Parameters
lowerfloatThe lower bound for clipping.
upperfloatThe upper bound for clipping.
standard_deviation_normalisation(self)
This function returns the dataframe normalized using Standard Deviation.
Functions
project_name_generator()
boxplot_stats(df, x_col, y_col, group_order=None, custom_colors=None, stats_options=None, p_value_threshold=0.05, annotate_style='value', y_offset_factor=0.05, show_non_significant=True, correct_p='bonferroni', title_=None, y_label=None, x_label=None, fig_height=800, fig_width=600)
Enhanced box plot function with customizable statistical analysis and annotation.
Parameters:
dfpandas.DataFrameThe input DataFrame containing the data for the plot.
x_colstrThe name of the column representing the categorical variable (e.g., treatment groups).
y_colstrThe name of the column representing the numerical variable (e.g., scores).
group_orderlist, optionalCustom order of groups for the x-axis. Defaults to the natural group order in the data.
custom_colorsdict, optionalA dictionary mapping group names to specific colors (e.g., {"A": "red", "B": "blue"}).
stats_optionslist of str, optionalStatistical tests and calculations to perform. Options: - "t-test": Perform pairwise Student's t-tests between groups. - "nonparametric": Use Mann-Whitney U test for pairwise comparisons. - "anova": Perform a one-way ANOVA (requires more than two groups). - "effect-size": Calculate Cohen's d for pairwise comparisons (not supported for ANOVA). Defaults to ["t-test"].
p_value_thresholdfloat, optionalThreshold for considering p-values as significant. Default is 0.05.
annotate_stylestr, optionalStyle for annotations. Options: - "value": Show exact p-values (e.g., "p=0.0123"). - "symbol": Use significance symbols (e.g., "***", "**", "*", or "ns" for not significant). Default is "value".
figure_sizetuple, optionalTuple specifying the width and height of the plot (in pixels). Default is (800, 600).
y_offset_factorfloat, optionalProportion of the y-axis range to use for spacing annotations. Default is 0.05.
show_non_significantbool, optionalWhether to display annotations for non-significant comparisons. Default is True.
correct_pstr, optionalMethod for correcting p-values for multiple comparisons. Options include: - "bonferroni" - "holm" - "fdr_bh" (Benjamini-Hochberg) - None (no correction) Default is "bonferroni".
title_str, optionalTitle of the plot. Defaults to the name of the y_col column.
y_labelstr, optionalLabel for the y-axis. Defaults to the name of the y_col column.
x_labelstr, optionalLabel for the x-axis. Defaults to the name of the x_col column.
fig_heightint, optionalHeight of the figure in pixels. Default is 800.
fig_widthint, optionalWidth of the figure in pixels. Default is 600.
Returns:
plotly.graph_objects.Figure A Plotly Figure object containing the enhanced box plot with statistical annotations.
Examples:
Example 1Basic box plot with t-tests and Bonferroni correction:data = { "treatment": ["A"] * 10 + ["B"] * 10, "score": [0.5, 0.6, 0.7, 0.8, 0.9, 0.7, 0.8, 0.9, 0.6, 0.5, 0.4, 0.5, 0.6, 0.7, 0.8, 0.6, 0.7, 0.8, 0.5, 0.4], } df = pd.DataFrame(data) fig = boxplot_stats( df, x_col="treatment", y_col="score", stats_options=["t-test"], correct_p="bonferroni", p_value_threshold=0.05 ) fig.show()
Example 2Advanced plot with custom colors, ANOVA, and effect sizes:data = { "treatment": ["A"] * 10 + ["B"] * 10 + ["C"] * 10, "score": np.random.rand(30), } df = pd.DataFrame(data) custom_colors = {"A": "red", "B": "blue", "C": "green"} fig = boxplot_stats( df, x_col="treatment", y_col="score", stats_options=["anova", "effect-size"], custom_colors=custom_colors ) fig.show()