vision-language-model

Here are 858 public repositories matching this topic...

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

chatbot llama multimodal multi-modality gpt-4 foundation-models visual-language-learning chatgpt instruction-tuning vision-language-model llava llama2 llama-2

Updated Aug 12, 2024
Python

OpenGVLab / InternVL

Star

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

image-classification gpt multi-modal semantic-segmentation video-classification image-text-retrieval llm vision-language-model gpt-4v vit-6b vit-22b gpt-4o

Updated Sep 22, 2025
Python

CVHub520 / X-AnyLabeling

Sponsor

Star

Effortless data labeling with AI support from Segment Anything and other awesome models.

Updated Mar 31, 2026
Python

jingyaogong / minimind-v

Star

🚀 「大模型」1小时从0训练67M参数的视觉多模态VLM！🌏 Train a 67M-parameter VLM from scratch in just 1 hours!

artificial-intelligence chatgpt vision-language-model

Updated Apr 1, 2026
Python

QwenLM / Qwen-VL

Star

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

large-language-models vision-language-model

Updated Aug 7, 2024
Python

volcengine / MineContext

Star

MineContext is your proactive context-aware AI partner（Context-Engineering+ChatGPT Pulse）

electron react javascript python agent typescript memory python3 embedding-models rag vector-database vision-language-model proactive-ai context-engineering

Updated Mar 12, 2026
Python

PKU-Alignment / align-anything

Star

Align Anything: Training All-modality Model with Feedback

chameleon multimodal dpo large-language-models rlhf vision-language-model

Updated Nov 27, 2025
Python

deepseek-ai / DeepSeek-VL

Star

DeepSeek-VL: Towards Real-World Vision-Language Understanding

foundation-models vision-language-pretraining vision-language-model

Updated Apr 24, 2024
Python

EvolvingLMMs-Lab / lmms-eval

Star

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

benchmark evaluation agi video-understanding vlm multimodal large-language-models vision-language-model llm-evaluation audio-evaluation multimodal-evaluation

Updated Mar 26, 2026
Python

MiniMax-AI / MiniMax-01

Star

The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention

vlm large-language-models llm llms vision-language-model minimax-text-01 minimax-vl-01

Updated Jul 7, 2025
Python

JIA-Lab-research / MGM

Star

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

generation large-language-models vision-language-model

Updated May 4, 2024
Python

jingyi0000 / VLM_survey

Star

Collection of AWESOME vision-language models for vision tasks

computer-vision deep-learning survey transfer-learning clip knowledge-distillation vision-language-model multi-modal-model

Updated Oct 14, 2025

InternLM / InternLM-XComposer

Star

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

foundation gpt language-model multimodal multi-modality vision-transformer gpt-4 visual-language-learning llm chatgpt instruction-tuning large-language-model supervised-finetuning mllm vision-language-model large-vision-language-model

Updated May 26, 2025
Python

Blaizzy / mlx-vlm

Sponsor

Star

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

mlx vision-framework apple-silicon vision-transformer llm vision-language-model llava local-ai idefics florence2 paligemma pixtral molmo

Updated Apr 1, 2026
Python

illuin-tech / colpali

Star

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

information-retrieval vision-language-model retrieval-augmented-generation colpali colqwen2 colsmol

Updated Mar 31, 2026
Python

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai gcc multimodality vlm cradle computer-control lmm grounding ai-agent large-language-models llm generative-ai vision-language-model ai-agents-framework general-computer-control personoid foundation-agent

Updated Nov 7, 2024
Python

AlibabaResearch / AdvancedLiterateMachinery

Star

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

Updated Mar 17, 2026
C++

2U1 / Qwen-VL-Series-Finetune

Star

An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.

vlm multimodal vision-language vision-language-model qwen2-vl qwen2-5-vl qwen3-vl qwen3-5

Updated Mar 25, 2026
Python

showlab / ShowUI

Star

[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

agent vision-language-model vision-language-action computer-use gui-agent

Updated Jan 20, 2026
Python

Thinklab-SJTU / Awesome-LLM4AD

Star

A curated list of awesome LLM/VLM/VLA/World Model for Autonomous Driving(LLM4AD) resources (continually updated)

large-language-models vision-language-model world-model vision-language-action-model

Updated Mar 30, 2026

Improve this page

Add a description, image, and links to the vision-language-model topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language-model topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vision-language-model

Here are 858 public repositories matching this topic...

haotian-liu / LLaVA

OpenGVLab / InternVL

CVHub520 / X-AnyLabeling

jingyaogong / minimind-v

QwenLM / Qwen-VL

volcengine / MineContext

PKU-Alignment / align-anything

deepseek-ai / DeepSeek-VL

EvolvingLMMs-Lab / lmms-eval

MiniMax-AI / MiniMax-01

JIA-Lab-research / MGM

jingyi0000 / VLM_survey

InternLM / InternLM-XComposer

Blaizzy / mlx-vlm

illuin-tech / colpali

BAAI-Agents / Cradle

AlibabaResearch / AdvancedLiterateMachinery

2U1 / Qwen-VL-Series-Finetune

showlab / ShowUI

Thinklab-SJTU / Awesome-LLM4AD

Improve this page

Add this topic to your repo