Classes
SpectralAutoencoder
Symmetric encoder-decoder for unsupervised NMR spectral embedding.
Trains a fully-connected autoencoder on spectral data and exposes methods to retrieve latent embeddings, reconstructions, and diagnostic plots.
Args:
XInput spectra as a DataFrame (samples x variables) or ndarrayof shape (n_samples, n_features).
latent_dimDimensionality of the bottleneck layer.hidden_dimsList of hidden layer sizes for the encoder; thedecoder mirrors these in reverse.
epochsNumber of training epochs.lrLearning rate for Adam optimiser.batch_sizeMini-batch size.random_stateSeed for reproducibility.deviceCompute device - 'auto', 'cpu', 'cuda', or 'mps'.'auto' selects CUDA > MPS > CPU automatically.
Attributes:
training_loss_List of per-epoch MSE loss values populated aftercalling fit().
Examples: >>> import numpy as np >>> from metbit.dl import SpectralAutoencoder >>> X = np.random.rand(80, 1000).astype("float32") >>> ae = SpectralAutoencoder(X, latent_dim=4, epochs=5) >>> ae.fit(verbose=False) SpectralAutoencoder(latent_dim=4, ...) >>> emb = ae.encode() >>> emb.shape (80, 4)
Methods
__init__(self, X, latent_dim: int=8, hidden_dims: list=None, epochs: int=100, lr: float=0.001, batch_size: int=32, random_state: int=42, device: str='auto')
fit(self, verbose: bool=True)
Train the autoencoder.
Args:
verboseIf True, print loss every 10 epochs.Returns: self, to allow method chaining.
encode(self, X=None)
Return latent embeddings for X.
Args:
XData to encode. If None, encodes the training data.Returns: np.ndarray of shape (n_samples, latent_dim).
reconstruct(self, X=None)
Return reconstructed spectra.
Args:
XData to reconstruct. If None, reconstructs the trainingdata.
Returns: np.ndarray of shape (n_samples, n_features) in original (un-normalised) scale.
plot_embedding(self, color_: 'pd.Series | list | None'=None, color_dict: 'dict | None'=None, components: list=None, fig_height: int=700, fig_width: int=900)
Scatter plot of two selected latent dimensions.
Args:
color_Group labels aligned with training samples. Used tocolour points.
color_dictMapping from label to hex/CSS colour string.If None, colours are assigned automatically.
componentsTwo-element list of zero-based latent dimensionindices to plot. Defaults to [0, 1].
fig_heightFigure height in pixels.fig_widthFigure width in pixels.Returns: plotly.graph_objects.Figure.
Examples: >>> fig = ae.plot_embedding(color_=labels) >>> fig.show()
plot_loss(self, fig_height: int=400, fig_width: int=700)
Plot the training MSE loss curve.
Args:
fig_heightFigure height in pixels.fig_widthFigure width in pixels.Returns: plotly.graph_objects.Figure.
SpectralMLP
Multi-layer perceptron classifier for NMR spectral data.
Trains a fully-connected MLP with dropout regularisation using cross-entropy loss. Labels are encoded internally via sklearn.preprocessing.LabelEncoder.
Args:
XInput spectra as a DataFrame (samples x variables) or ndarrayof shape (n_samples, n_features).
yClass labels as a Series, ndarray, or list. Can be strings orintegers.
hidden_dimsList of hidden layer sizes.epochsNumber of training epochs.lrLearning rate for Adam optimiser.batch_sizeMini-batch size.dropoutDropout probability applied after each hidden ReLU.random_stateSeed for reproducibility.deviceCompute device - 'auto', 'cpu', 'cuda', or 'mps'.Attributes:
training_loss_List of per-epoch cross-entropy loss valuespopulated after calling fit().
label_encoder_Fitted LabelEncoder instance.Examples: >>> import numpy as np, pandas as pd >>> from metbit.dl import SpectralMLP >>> X = np.random.rand(60, 500).astype("float32") >>> y = ["ctrl"] * 30 + ["case"] * 30 >>> clf = SpectralMLP(X, y, epochs=5) >>> clf.fit(verbose=False) SpectralMLP(hidden_dims=[256, 128, 64], ...) >>> preds = clf.predict() >>> preds.shape (60,)
Methods
__init__(self, X, y, hidden_dims: list=None, epochs: int=100, lr: float=0.001, batch_size: int=32, dropout: float=0.3, random_state: int=42, device: str='auto')
fit(self, verbose: bool=True)
Train the MLP classifier.
Args:
verboseIf True, print loss every 10 epochs.Returns: self, to allow method chaining.
predict(self, X=None)
Return predicted class labels.
Args:
XData to classify. If None, classifies the training data.Returns: np.ndarray of class label strings (or original dtype).
predict_proba(self, X=None)
Return class probabilities.
Args:
XData to score. If None, scores the training data.Returns: np.ndarray of shape (n_samples, n_classes).
get_accuracy(self)
Compute training accuracy.
Returns: Fraction of correctly classified training samples.
plot_loss(self, fig_height: int=400, fig_width: int=700)
Plot the training cross-entropy loss curve.
Args:
fig_heightFigure height in pixels.fig_widthFigure width in pixels.Returns: plotly.graph_objects.Figure.
plot_confusion_matrix(self, normalize: bool=True, fig_height: int=600, fig_width: int=700)
Plot a confusion matrix heatmap for training predictions.
Args:
normalizeIf True, show row-normalised proportions;otherwise show raw counts.
fig_heightFigure height in pixels.fig_widthFigure width in pixels.Returns: plotly.graph_objects.Figure.
SpectralCNN
1-D Convolutional Neural Network for NMR spectral classification.
Each block applies Conv1d -> BatchNorm1d -> ReLU -> MaxPool1d. An AdaptiveAvgPool1d(1) aggregates temporal features before the final linear classifier.
Args:
XInput spectra as a DataFrame (samples x variables) or ndarrayof shape (n_samples, n_features).
yClass labels as a Series, ndarray, or list.filtersNumber of filters in each convolutional block.kernel_sizeKernel size for all Conv1d layers.epochsNumber of training epochs.lrLearning rate for Adam optimiser.batch_sizeMini-batch size.dropoutDropout probability before the final linear layer.random_stateSeed for reproducibility.deviceCompute device - 'auto', 'cpu', 'cuda', or 'mps'.Attributes:
training_loss_List of per-epoch cross-entropy loss valuespopulated after calling fit().
label_encoder_Fitted LabelEncoder instance.Examples: >>> import numpy as np >>> from metbit.dl import SpectralCNN >>> X = np.random.rand(60, 1024).astype("float32") >>> y = ["ctrl"] * 30 + ["case"] * 30 >>> cnn = SpectralCNN(X, y, epochs=5) >>> cnn.fit(verbose=False) SpectralCNN(filters=[32, 64, 128], ...) >>> preds = cnn.predict() >>> preds.shape (60,)
Methods
__init__(self, X, y, filters: list=None, kernel_size: int=7, epochs: int=100, lr: float=0.001, batch_size: int=32, dropout: float=0.3, random_state: int=42, device: str='auto')
fit(self, verbose: bool=True)
Train the CNN classifier.
Args:
verboseIf True, print loss every 10 epochs.Returns: self, to allow method chaining.
predict(self, X=None)
Return predicted class labels.
Args:
XData to classify. If None, classifies the training data.Returns: np.ndarray of class label strings (or original dtype).
predict_proba(self, X=None)
Return class probabilities.
Args:
XData to score. If None, scores the training data.Returns: np.ndarray of shape (n_samples, n_classes).
get_accuracy(self)
Compute training accuracy.
Returns: Fraction of correctly classified training samples.
plot_loss(self, fig_height: int=400, fig_width: int=700)
Plot the training cross-entropy loss curve.
Args:
fig_heightFigure height in pixels.fig_widthFigure width in pixels.Returns: plotly.graph_objects.Figure.
plot_confusion_matrix(self, normalize: bool=True, fig_height: int=600, fig_width: int=700)
Plot a confusion matrix heatmap for training predictions.
Args:
normalizeIf True, show row-normalised proportions;otherwise show raw counts.
fig_heightFigure height in pixels.fig_widthFigure width in pixels.Returns: plotly.graph_objects.Figure.