Classes
lda
Linear Discriminant Analysis (LDA) for supervised dimensionality reduction.
Parameters:
XFeature matrix (DataFrame or ndarray), rows=samples, cols=features.yClass labels (Series, ndarray, or list).features_nameOptional feature names. Inferred from X columns when X isa DataFrame and features_name is None.
n_componentsNumber of LD components to retain. Defaults to n_classes - 1.scaling_methodOne of "pareto", "mean", "uv", "minmax", or None.random_stateUnused directly but kept for API consistency.Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import lda >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> y = pd.Series(['A'] * 20 + ['B'] * 20 + ['C'] * 20) >>> model = lda(X=X, y=y, n_components=2) >>> model.fit() >>> scores = model.get_scores() >>> fig = model.plot_lda_scores()
Methods
__init__(self, X: Union[pd.DataFrame, np.ndarray], y: Union[pd.Series, np.ndarray, List[Any]], features_name: Optional[Union[pd.Series, np.ndarray, List[Any]]]=None, n_components: Optional[int]=None, scaling_method: str='pareto', random_state: int=42)
fit(self)
Fit the LDA model to the scaled data.
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import lda >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> y = pd.Series(['A'] * 30 + ['B'] * 30) >>> model = lda(X=X, y=y) >>> model.fit()
get_scores(self)
Return the LD scores DataFrame.
Returns: DataFrame of shape (n_samples, n_components + 1) with LD columns and a 'Group' column.
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import lda >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> y = pd.Series(['A'] * 30 + ['B'] * 30) >>> model = lda(X=X, y=y) >>> model.fit() >>> df = model.get_scores()
get_loadings(self)
Return the LD loadings (scalings) DataFrame.
Returns: DataFrame of shape (n_features, n_components) indexed by feature names.
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import lda >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> y = pd.Series(['A'] * 30 + ['B'] * 30) >>> model = lda(X=X, y=y) >>> model.fit() >>> df = model.get_loadings()
get_explained_variance(self)
Return the explained variance ratio per LD component.
Returns: DataFrame with columns 'LD', 'Explained variance', 'Cumulative variance'.
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import lda >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> y = pd.Series(['A'] * 30 + ['B'] * 30) >>> model = lda(X=X, y=y) >>> model.fit() >>> df = model.get_explained_variance()
plot_lda_scores(self, ld: List[str]=['LD1', 'LD2'], color_: Optional[pd.Series]=None, color_dict: Optional[dict]=None, marker_size: int=35, fig_height: int=900, fig_width: int=1300, font_size: int=20)
Plot LDA scores scatter.
Parameters:
ldTwo LD component names to plot on x and y axes.color_Optional alternative grouping series for colouring points.color_dictOptional mapping of group label to colour hex string.marker_sizeMarker diameter in pixels.fig_heightFigure height in pixels.fig_widthFigure width in pixels.font_sizeGlobal font size.Returns: Plotly Figure object.
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import lda >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> y = pd.Series(['A'] * 30 + ['B'] * 30) >>> model = lda(X=X, y=y) >>> model.fit() >>> fig = model.plot_lda_scores(ld=['LD1', 'LD2']) >>> fig.show()
plot_loading_(self, ld: List[str]=['LD1', 'LD2'], fig_height: int=600, fig_width: int=1800, font_size: int=20)
Plot LDA loadings as a scatter over features.
Parameters:
ldLD component names to overlay.fig_heightFigure height in pixels.fig_widthFigure width in pixels.font_sizeGlobal font size.Returns: Plotly Figure object.
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import lda >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> y = pd.Series(['A'] * 30 + ['B'] * 30) >>> model = lda(X=X, y=y) >>> model.fit() >>> fig = model.plot_loading_(ld=['LD1', 'LD2']) >>> fig.show()
plsr
PLS Regression for continuous response prediction.
Parameters:
XFeature matrix (DataFrame or ndarray), rows=samples, cols=features.yContinuous response variable (numeric Series, ndarray, or list).features_nameOptional feature names.n_componentsNumber of latent components. Default is 2.scaling_methodOne of "pareto", "mean", "uv", "minmax", or None.random_stateUnused directly but kept for API consistency.Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import plsr >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> y = pd.Series(np.random.rand(60)) >>> model = plsr(X=X, y=y, n_components=2) >>> model.fit() >>> metrics = model.get_metrics() >>> fig = model.plot_predicted_vs_actual()
Methods
__init__(self, X: Union[pd.DataFrame, np.ndarray], y: Union[pd.Series, np.ndarray, List[Any]], features_name: Optional[Union[pd.Series, np.ndarray, List[Any]]]=None, n_components: int=2, scaling_method: str='pareto', random_state: int=42)
fit(self)
Fit the PLS Regression model and compute cross-validated Q2.
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import plsr >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> y = pd.Series(np.random.rand(60)) >>> model = plsr(X=X, y=y, n_components=2) >>> model.fit()
predict(self, X_new: Union[pd.DataFrame, np.ndarray])
Predict response for new samples.
Parameters:
X_newNew feature matrix with the same number of features as training X.Returns: 1-D ndarray of predicted values.
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import plsr >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> y = pd.Series(np.random.rand(60)) >>> model = plsr(X=X, y=y, n_components=2) >>> model.fit() >>> y_hat = model.predict(X)
get_scores(self)
Return the T (X) scores DataFrame.
Returns: DataFrame of shape (n_samples, n_components).
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import plsr >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> y = pd.Series(np.random.rand(60)) >>> model = plsr(X=X, y=y, n_components=2) >>> model.fit() >>> df = model.get_scores()
get_loadings(self)
Return the P (X) loadings DataFrame.
Returns: DataFrame of shape (n_features, n_components).
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import plsr >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> y = pd.Series(np.random.rand(60)) >>> model = plsr(X=X, y=y, n_components=2) >>> model.fit() >>> df = model.get_loadings()
get_weights(self)
Return the W (X) weights DataFrame.
Returns: DataFrame of shape (n_features, n_components).
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import plsr >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> y = pd.Series(np.random.rand(60)) >>> model = plsr(X=X, y=y, n_components=2) >>> model.fit() >>> df = model.get_weights()
get_metrics(self)
Return model performance metrics.
Returns: dict with keys 'R2' (training), 'Q2' (LOO cross-validated), 'RMSE'.
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import plsr >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> y = pd.Series(np.random.rand(60)) >>> model = plsr(X=X, y=y, n_components=2) >>> model.fit() >>> m = model.get_metrics()
plot_predicted_vs_actual(self, fig_height: int=600, fig_width: int=700, font_size: int=14)
Scatter plot of actual vs predicted response with R2 annotation.
Parameters:
fig_heightFigure height in pixels.fig_widthFigure width in pixels.font_sizeGlobal font size.Returns: Plotly Figure object.
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import plsr >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> y = pd.Series(np.random.rand(60)) >>> model = plsr(X=X, y=y, n_components=2) >>> model.fit() >>> fig = model.plot_predicted_vs_actual() >>> fig.show()
plot_scores(self, fig_height: int=700, fig_width: int=900, font_size: int=14)
Scatter plot of T1 vs T2 latent variable scores.
Parameters:
fig_heightFigure height in pixels.fig_widthFigure width in pixels.font_sizeGlobal font size.Returns: Plotly Figure object.
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import plsr >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> y = pd.Series(np.random.rand(60)) >>> model = plsr(X=X, y=y, n_components=2) >>> model.fit() >>> fig = model.plot_scores() >>> fig.show()
ica
Independent Component Analysis (ICA) for blind source separation.
Parameters:
XFeature matrix (DataFrame or ndarray), rows=samples, cols=features.n_componentsNumber of independent components. Default is 2.max_iterMaximum iterations for FastICA. Default is 1000.random_stateRandom seed for reproducibility. Default is 42.scaling_methodOne of "pareto", "mean", "uv", "minmax", or None.Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import ica >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> model = ica(X=X, n_components=2) >>> model.fit() >>> components = model.get_components() >>> fig = model.plot_components()
Methods
__init__(self, X: Union[pd.DataFrame, np.ndarray], n_components: int=2, max_iter: int=1000, random_state: int=42, scaling_method: str='pareto')
fit(self)
Fit the FastICA model to the scaled data.
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import ica >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> model = ica(X=X, n_components=2) >>> model.fit()
get_components(self)
Return the IC component matrix (rows=samples).
Returns: DataFrame of shape (n_samples, n_components).
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import ica >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> model = ica(X=X, n_components=2) >>> model.fit() >>> df = model.get_components()
get_mixing(self)
Return the mixing matrix (rows=features).
Returns: DataFrame of shape (n_features, n_components).
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import ica >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> model = ica(X=X, n_components=2) >>> model.fit() >>> df = model.get_mixing()
plot_components(self, ic: List[str]=['IC1', 'IC2'], color_: Optional[Union[pd.Series, List[Any]]]=None, color_dict: Optional[dict]=None, fig_height: int=900, fig_width: int=1300, font_size: int=20)
Scatter plot of two IC score components.
Parameters:
icTwo IC component names for x and y axes.color_Optional group labels for colouring points.color_dictOptional mapping of group label to colour hex string.fig_heightFigure height in pixels.fig_widthFigure width in pixels.font_sizeGlobal font size.Returns: Plotly Figure object.
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import ica >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> label = pd.Series(['A'] * 30 + ['B'] * 30) >>> model = ica(X=X, n_components=2) >>> model.fit() >>> fig = model.plot_components(ic=['IC1', 'IC2'], color_=label) >>> fig.show()
plot_mixing_(self, ic: List[str]=['IC1', 'IC2'], fig_height: int=600, fig_width: int=1800, font_size: int=20)
Plot the mixing matrix columns over features.
Parameters:
icIC column names to overlay.fig_heightFigure height in pixels.fig_widthFigure width in pixels.font_sizeGlobal font size.Returns: Plotly Figure object.
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import ica >>> X = pd.DataFrame(np.random.rand(60, 50)) >>> model = ica(X=X, n_components=2) >>> model.fit() >>> fig = model.plot_mixing_(ic=['IC1', 'IC2']) >>> fig.show()
hca
Hierarchical Cluster Analysis (HCA) with Plotly visualisation.
Parameters:
XFeature matrix (DataFrame or ndarray), rows=samples, cols=features.labelOptional sample labels used for dendrogram leaf annotations.features_nameOptional feature names.methodLinkage method passed to scipy.cluster.hierarchy.linkage.Common values"ward", "complete", "average", "single".metricDistance metric passed to scipy.cluster.hierarchy.linkage.scaling_methodOne of "pareto", "mean", "uv", "minmax", or None.Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import hca >>> X = pd.DataFrame(np.random.rand(30, 50)) >>> label = pd.Series(['A'] * 15 + ['B'] * 15) >>> model = hca(X=X, label=label) >>> model.fit() >>> fig_dend = model.plot_dendrogram() >>> fig_heat = model.plot_heatmap(n_clusters=2)
Methods
__init__(self, X: Union[pd.DataFrame, np.ndarray], label: Optional[Union[pd.Series, np.ndarray, List[Any]]]=None, features_name: Optional[Union[pd.Series, np.ndarray, List[Any]]]=None, method: str='ward', metric: str='euclidean', scaling_method: str='pareto')
fit(self)
Compute the linkage matrix via hierarchical clustering.
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import hca >>> X = pd.DataFrame(np.random.rand(30, 50)) >>> model = hca(X=X) >>> model.fit()
get_cluster_labels(self, n_clusters: int=3)
Return flat cluster assignments via scipy fcluster.
Parameters:
n_clustersNumber of flat clusters to form.Returns: pd.Series of integer cluster IDs aligned to the original sample order.
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import hca >>> X = pd.DataFrame(np.random.rand(30, 50)) >>> model = hca(X=X) >>> model.fit() >>> clusters = model.get_cluster_labels(n_clusters=3)
plot_dendrogram(self, fig_height: int=700, fig_width: int=1200, font_size: int=12, color_threshold: Optional[float]=None)
Draw the hierarchical dendrogram as a Plotly figure.
The dendrogram is constructed by calling scipy.cluster.hierarchy.dendrogram and manually translating the coordinate output into Plotly line traces.
Parameters:
fig_heightFigure height in pixels.fig_widthFigure width in pixels.font_sizeTick label font size.color_thresholdHeight threshold used to colour the dendrogram branches.Defaults to 70% of the maximum linkage height.
Returns: Plotly Figure object.
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import hca >>> X = pd.DataFrame(np.random.rand(30, 50)) >>> label = [f'S{i}' for i in range(30)] >>> model = hca(X=X, label=label) >>> model.fit() >>> fig = model.plot_dendrogram() >>> fig.show()
plot_heatmap(self, n_clusters: int=3, fig_height: int=900, fig_width: int=900, colorscale: str='RdBu_r', font_size: int=12)
Clustered heatmap with rows ordered by hierarchical clustering.
Parameters:
n_clustersNumber of clusters for colour-bar annotation.fig_heightFigure height in pixels.fig_widthFigure width in pixels.colorscalePlotly colorscale name for the heatmap.font_sizeGlobal font size.Returns: Plotly Figure object.
Examples: >>> import numpy as np >>> import pandas as pd >>> from metbit.analysis.multivariate import hca >>> X = pd.DataFrame(np.random.rand(30, 50)) >>> label = [f'S{i}' for i in range(30)] >>> model = hca(X=X, label=label) >>> model.fit() >>> fig = model.plot_heatmap(n_clusters=3) >>> fig.show()