Data-efficient Image Transformer (DeiT).
Keras
community
AI & ML interests
Multi-framework Deep Learning, Computer Vision, Natural Language Processing.
Recent Activity
HGNetV2: GPU-Efficient, Lightweight CNN for Real-Time, Edge-Focused Image Classification
Multilingual, multimodal LLMs (30B and 235B) with advanced reasoning, agent capabilities, and seamless mode switching.
Evolutionary Scale Modeling (ESM) version 2 architecture.
The Qwen3 Embedding model series.
Open-weight safety reasoning model designed for content classification and foundational safety tasks.Built upon the GPT OSS architecture.
Vision Transformer (ViT) and ConvNeXt models trained using the DINOv3 method.
Segment Anything 3 (SAM3) promptable concept image segmenter Model.
Next-Gen Agentic Coding with Repository-Scale Context,Agentic Tasks, Browser-Use, and 1M Token Long-Context.
Collection for all presets of AlbertBackbone
Multilingual, multimodal LLMs (0.6B–32B) with advanced reasoning, agent capabilities, and seamless mode switching.
Vision Transformer (ViT) model trained using the DINOv2 method.
Redefine Regression Task in DETRs as Fine-grained Distribution Refinement.
Permuted autoregressive sequence (PARSeq) model for Scene Text Recognition (STR)
MedSigLIP is a lightweight, multimodal AI encoder designed by Google for healthcare applications that require interpreting and matching medical images
State-of-the-art open-weight language models that deliver strong real-world performance at low cost.
State-of-the-art, 3-billion parameter open multimodal language model
Code-Specific Qwen large language models
Receptance Weighted Key Value architecture
Advanced Bilingual Mathematical Reasoning via Chain-of-Thought and Tool-Integrated Logic. Qwen2.5-Math is the latest series of Qwen LLMs.
BASNet use two stage predict and refine architecture and a hybrid loss it can predict highly accurate boundaries and fine structures for image segment
Data-efficient Image Transformer (DeiT).
Multilingual, multimodal LLMs (0.6B–32B) with advanced reasoning, agent capabilities, and seamless mode switching.
HGNetV2: GPU-Efficient, Lightweight CNN for Real-Time, Edge-Focused Image Classification
Vision Transformer (ViT) model trained using the DINOv2 method.
Multilingual, multimodal LLMs (30B and 235B) with advanced reasoning, agent capabilities, and seamless mode switching.
Redefine Regression Task in DETRs as Fine-grained Distribution Refinement.
Evolutionary Scale Modeling (ESM) version 2 architecture.
Permuted autoregressive sequence (PARSeq) model for Scene Text Recognition (STR)
MedSigLIP is a lightweight, multimodal AI encoder designed by Google for healthcare applications that require interpreting and matching medical images
The Qwen3 Embedding model series.
State-of-the-art open-weight language models that deliver strong real-world performance at low cost.
Open-weight safety reasoning model designed for content classification and foundational safety tasks.Built upon the GPT OSS architecture.
State-of-the-art, 3-billion parameter open multimodal language model
Vision Transformer (ViT) and ConvNeXt models trained using the DINOv3 method.
Code-Specific Qwen large language models
Segment Anything 3 (SAM3) promptable concept image segmenter Model.
Receptance Weighted Key Value architecture
Next-Gen Agentic Coding with Repository-Scale Context,Agentic Tasks, Browser-Use, and 1M Token Long-Context.
Advanced Bilingual Mathematical Reasoning via Chain-of-Thought and Tool-Integrated Logic. Qwen2.5-Math is the latest series of Qwen LLMs.
BASNet use two stage predict and refine architecture and a hybrid loss it can predict highly accurate boundaries and fine structures for image segment
Collection for all presets of AlbertBackbone