mllm

Star

Here are 138 public repositories matching this topic...

microsoft / unilm

Star

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Updated Jul 3, 2025
Python

simular-ai / Agent-S

Star

Agent S: an open agentic framework that uses computers like a human

memory planning cua ai-agents grounding computer-automation mllm retrieval-augmented-generation in-context-reinforcement-learning agent-computer-interface gui-agents computer-use computer-use-agent

Updated Oct 5, 2025
Python

X-PLUG / MobileAgent

Star

Mobile-Agent: The Powerful GUI Agent Family

android agent app gui automation mobile copilot multimodal mobile-agents mllm multimodal-large-language-models multimodal-agent

Updated Oct 10, 2025
Python

manycore-research / SpatialLM

Star

[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling

point-clouds scene-understanding mllm spatial-intelligence

Updated Sep 26, 2025
Python

ant-research / MagicQuill

Star

[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System

image-editing gradio aigc mllm

Updated Jul 29, 2025
Python

NExT-GPT / NExT-GPT

Star

Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

multimodal gpt-4 foundation-models visual-language-learning large-language-models llm chatgpt instruction-tuning mllm multi-modal-chatgpt

Updated May 13, 2025
Python

InternLM / InternLM-XComposer

Star

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

foundation gpt language-model multimodal multi-modality vision-transformer gpt-4 visual-language-learning llm chatgpt instruction-tuning large-language-model supervised-finetuning mllm vision-language-model large-vision-language-model

Updated May 26, 2025
Python

X-PLUG / mPLUG-DocOwl

Star

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

multimodal table-understanding document-understanding mllm multimodal-large-language-models chart-understanding

Updated May 30, 2025
Python

cambrian-mllm / cambrian

Star

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

computer-vision chatbot representation-learning clip dino large-language-models llms instruction-tuning mllm multimodal-large-language-models

Updated Oct 30, 2024
Python

bytedance / Sa2VA

Star

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

computer-vision large-language-models mllm

Updated Sep 8, 2025
Python

BAAI-DCAI / Bunny

Star

A family of lightweight multimodal models.

english chinese vlm gpt-4 chatgpt mllm multimodal-large-language-models

Updated Nov 18, 2024
Python

NVlabs / Eagle

Star

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

demo eagle llama lmm nvdia huggingface gpt4 large-language-models llm mllm llava lvlm llama3

Updated Aug 8, 2025
Python

CircleRadon / Osprey

Star

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

sam mllm visual-instruction-tuning pixel-understanding

Updated Aug 19, 2025
Python

taco-group / OpenEMMA

Star

OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.

machine-learning networking algorithms transportation artificial-intelligence perception autonomous-car autonomous-driving autonomous-vehicles emma autonomy generative-ai mllm open-emma large-lang

Updated May 13, 2025
Python

VITA-MLLM / Woodpecker

Star

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

multimodality hallucination hallucinations large-language-models llm mllm multimodal-large-language-models

Updated Dec 23, 2024
Python

FoundationVision / Groma

Star

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

llama multimodal grounding foundation-models large-language-models llm mllm vision-language-model llama2

Updated Jun 7, 2024
Python

taco-group / 4KAgent

Star

[NeurIPS 2025] 4KAgent: Agentic Any Image to 4K Super-Resolution. An intelligent computer vision agent that can magically restore any image to perfect-4K!

agent workflow computer-vision image-processing low-level super-resolution image-restoration image-enhancement neurips large-language-models llm mllm vision-language-models agentic-ai neurips-2025

Updated Sep 24, 2025
Python

SkyworkAI / Vitron

Star

NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

segmentation mllm multimodal-large-language-models

Updated Oct 20, 2024
Python

gokayfem / ComfyUI_VLM_nodes

Star

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

image-captioning nodes vlm custom-nodes img2text llm mllm llava comfyui siglip phi15 joytag img2sfx

Updated Feb 13, 2025
Python

EvolvingLMMs-Lab / LLaVA-OneVision-1.5

Star

Fully Open Framework for Democratized Multimodal Training

llm mllm vision-language-model llava qwen3

Updated Oct 10, 2025
Python

Improve this page

Add a description, image, and links to the mllm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mllm topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mllm

Here are 138 public repositories matching this topic...

microsoft / unilm

simular-ai / Agent-S

X-PLUG / MobileAgent

manycore-research / SpatialLM

ant-research / MagicQuill

NExT-GPT / NExT-GPT

InternLM / InternLM-XComposer

X-PLUG / mPLUG-DocOwl

cambrian-mllm / cambrian

bytedance / Sa2VA

BAAI-DCAI / Bunny

NVlabs / Eagle

CircleRadon / Osprey

taco-group / OpenEMMA

VITA-MLLM / Woodpecker

FoundationVision / Groma

taco-group / 4KAgent

SkyworkAI / Vitron

gokayfem / ComfyUI_VLM_nodes

EvolvingLMMs-Lab / LLaVA-OneVision-1.5

Improve this page

Add this topic to your repo