View on GitHub

MultiModelClassification

A multimodal classification project combining text and image processing. Includes pre-trained models and a streamlined interface for inference.

MultiModelClassification

Banner

A multimodal classification project that combines text and image inputs to predict a target class. This repository includes ready-to-use pre-trained models, a simple application interface, and Jupyter/py notebooks for experimentation and evaluation.

Tip: This README includes Jekyll front matter and is optimized for the GitHub Pages Slate theme. When you enable GitHub Pages with the Slate theme, this page will render as your site’s homepage.


Table of contents


Features


Repository structure

MultiModelClassification/
├── NotebooksPY/                   # Jupyter/Python notebooks for experimentation
├── app.py                         # Main application interface (run locally)
├── image_classification_model.pt  # Pretrained ResNet50 image classification model
├── text_classification_model.h5   # Pretrained BiLSTM+Attention text classification model
├── tokenizer.joblib               # Pre-fitted tokenizer for text preprocessing
├── requirements.txt               # Python dependencies
└── README.md                      # Project documentation (renders on GitHub Pages)

Notes:


Quickstart

1) Clone and environment

git clone https://github.com/JaswanthRemiel/MultiModelClassification.git
cd MultiModelClassification

# Create and activate a virtual environment (choose one)
python -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
# or
conda create -n mmc python=3.10 -y && conda activate mmc

2) Install dependencies

pip install --upgrade pip
pip install -r requirements.txt

If you need GPU acceleration for PyTorch, ensure the wheel matches your CUDA version (see PyTorch Get Started for the correct index-url). CPU-only is fine for smaller demos.

3) Run the app

The repository includes a main application interface in app.py. Depending on how the app is implemented, use one of the following:

Tip: If the app prints usage help, run python app.py --help.

4) Run the notebooks

Launch Jupyter and open the notebooks inside NotebooksPY/:

jupyter lab
# or
jupyter notebook

Inside each notebook, adjust:


Data expectations

The typical multimodal row contains paired text and an image path. A common format is a CSV:

id,text,image_path,label
0001,"Short description for the image","data/images/0001.jpg","class_a"
0002,"Another description","data/images/0002.jpg","class_b"

Recommended layout:

data/
  images/
    0001.jpg
    0002.jpg
  train.csv
  val.csv
  test.csv

Models and architecture

Tips:


Configuration

Common configuration points (set in notebooks or at the top of app.py):


Evaluation

Typical metrics for classification:

Most notebooks include cells to compute and visualize these. Save your runs’ metrics to CSV/JSON for comparison across configurations.


Reproducibility

Set seeds and control non-determinism when benchmarking:

import os, random, numpy as np, torch

def set_seed(seed: int = 42):
    random.seed(seed)
    np.random.seed(seed)
    os.environ["PYTHONHASHSEED"] = str(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

set_seed(42)

Keep notes of:


Troubleshooting


Roadmap


Contributing

Contributions are welcome! You can:

Please include a clear description, reproduction steps (if applicable), and relevant logs/screenshots.


Citations and acknowledgments

If you use this project, please consider citing the libraries and models you rely on: