Classes
lazy_opls_da
Parameters:
• data (pd.DataFrame): DataFrame containing the dataset. • groups (list): List of class labels for each data sample. • working_dir (str): Directory path for storing output files. • feature_names (list, optional): Names of features, defaults to None. • n_components (int, optional): Number of components for OPLS-DA, defaults to 2. • scaling (str, optional): Scaling method ('pareto'), defaults to 'pareto'. • estimator (str, optional): Model estimator, defaults to 'opls'. • kfold (int, optional): Number of folds in cross-validation, defaults to 3. • random_state (int, optional): Random seed, defaults to 94. • auto_ncomp (bool, optional): Automatically choose the optimal number of components, defaults to True. • permutation (bool, optional): Conduct permutation tests, defaults to True. • VIP (bool, optional): Calculate VIP scores, defaults to True. • linear_regression (bool, optional): Conduct linear regression analysis, defaults to True.
Returns:
• A printout of the model summary, including the project name, dataset information, configuration, and directory paths.
fit Method
Fits the OPLS-DA model to the dataset, generates plots, and saves them to the output directory.
Parameters:
• marker_color (dict, optional): Dictionary mapping groups to colors. • custom_color (list, optional): Custom color grouping. • custom_shape (list, optional): Custom shape grouping. • symbol_dict (dict, optional): Dictionary mapping groups to marker symbols. • custom_legend_name (list, optional): Custom for the legend, defaults to ['Group', 'Sub-group']. • marker_label (str or None, optional): Specifies marker labels ('class', 'group', or 'sub-group'). • marker_size (int or None, optional): Size of markers in plots. • marker_opacity (float or None, optional): Opacity level of markers in plots. • individual_ellipse (bool, optional): Option to display individual ellipses for each group.
Returns:
• A message indicating the model fitting was successful.
Directory and Project Setup =========================== Creates necessary folders in the working directory based on project needs (e.g., for VIP score plots, permutation scores, etc.). Paths are stored in a dictionary (self.path).
Directories Created:
• working_dir/project_name/element/plots/... for different plots. • working_dir/project_name/element/data/... for data outputs.
Plotting and Saving Data
1. Score Plot: Generates OPLS-DA score plots for each group. 2. Loading Plot: Generates and saves loading plots. 3. S Plot: Generates and saves S-score plots. 4. VIP Score Plot: Generates VIP score plots and saves VIP scores as CSV if VIP=True. 5. Permutation Test Plot: Conducts permutation tests and saves permutation scores as CSV if permutation=True. 6. Volcano Plot (Linear Regression): Generates volcano plot and saves data if linear_regression=True.
Examples: >>> import pandas as pd >>> import numpy as np >>> import metbit >>> X = pd.DataFrame(np.random.randn(40, 100), columns=[str(i) for i in range(100)]) >>> y = ['Control'] * 20 + ['Disease'] * 20 >>> model = metbit.lazy_opls_da(data=X, groups=y, working_dir='/tmp/opls_output') >>> model.fit()
Methods
__init__(self, data: pd.DataFrame, groups: list, working_dir: str, feature_: list=None, n_components: int=2, scaling: str='pareto', estimator: str='opls', kfold: int=3, random_state: int=94, auto_ncomp: bool=True, permutation: bool=True, n_permutation: int=500, n_jobs: int=4, VIP: bool=True, VIP_threshold: float=1.5, linear_regression: bool=True, FC_threshold: float=1.5, p_val_threshold: float=2)
This function takes in a dataframe and a list of y values and returns the project_name model.
Parameters
datapandas dataframeThe dataframe to be used.
ylistThe list of y values.
n_componentsintThe number of components to use. lazy_opls_da(data, y, n_components).fit()
Examples: >>> import pandas as pd >>> import numpy as np >>> import metbit >>> X = pd.DataFrame(np.random.randn(40, 100), columns=[str(i) for i in range(100)]) >>> y = ['Control'] * 20 + ['Disease'] * 20 >>> model = metbit.lazy_opls_da(data=X, groups=y, working_dir='/tmp/opls_output')
fit(self, marker_color: dict=None, custom_color: list=None, custom_shape: list=None, symbol_dict: dict=None, custom_legend_name=['Group', 'Sub-group'], marker_label=None, marker_size=None, marker_opacity=None, individual_ellipse=False)
Fit the OPLS-DA model to all pairwise group comparisons and save plots and data.
Parameters: marker_color (dict, optional): Dictionary mapping group labels to hex color strings. custom_color (list, optional): List assigning a color group to each sample. custom_shape (list, optional): List assigning a shape group to each sample. symbol_dict (dict, optional): Dictionary mapping group labels to plotly marker symbols. custom_legend_name (list, optional): Legend header names, defaults to ['Group', 'Sub-group']. marker_label (str or None, optional): Marker label source - 'class', 'group', 'sub-group', or 'index'. marker_size (int or None, optional): Marker size in pixels. marker_opacity (float or None, optional): Marker opacity between 0 and 1. individual_ellipse (bool, optional): Draw a confidence ellipse per group, defaults to False.
Returns: None
Examples: >>> import pandas as pd >>> import numpy as np >>> import metbit >>> X = pd.DataFrame(np.random.randn(40, 100), columns=[str(i) for i in range(100)]) >>> y = ['Control'] * 20 + ['Disease'] * 20 >>> model = metbit.lazy_opls_da(data=X, groups=y, working_dir='/tmp/opls_output') >>> model.fit(marker_size=20, marker_opacity=0.8)