Base Models

Within the domain of LLMs, the term "base model" denotes the initial iteration, meticulously honed through extensive training on extensive datasets utilizing deep learning methodologies. Following this foundational training, it can be tailored to execute specific tasks, such as text classification, question-answering, summarization, and text generation. Base model constitutes the fundamental framework underpinning practical AI applications.

Models

Models	Developed by	Parameter	Description
Mistral (opens in a new tab)	Mistral AI (opens in a new tab)	Mistral-7B (opens in a new tab), Mistral-7B-Instruct (opens in a new tab)	Mistral-7B-v0.1 is a small, yet powerful model adaptable to many use-cases. It outperforms Llama 2 13B on all benchmark assessments, possesses inherent natural coding capabilities, and 8k sequence length. Released under the Apache 2.0 license, it has been designed for effortless deployment on a wide range of cloud platforms, ensuring accessibility and convenience for users.
Falcon (opens in a new tab)	Technology Innovation Institute (TII) (opens in a new tab)	Falcon-7B (opens in a new tab), Falcon-40B (opens in a new tab), Falcon-180B (opens in a new tab)	The Falcon LLM has garnered acclaim as an advanced technology capable of both comprehending and generating human language, boasting versatile applications across diverse industries. It has ascended to a position of prominence, surpassing its predecessor, LLaMA, another substantial language model, and is now widely regarded as the preeminent leader within the realm of open-source LLMs.
LLaMA 2 (opens in a new tab)	Meta AI (opens in a new tab)	LLaMa2-7B (opens in a new tab), LLaMa2-13B (opens in a new tab), LLaMa2-70B (opens in a new tab)	Designed to help researchers & developers in the field of AI by providing a versatile and efficient LLM, trained on a wide range of text sources, including Common Crawl, GitHub, Wikipedia, Project Gutenberg, ArXiv, and Stack Exchange.
MPT (opens in a new tab)	Mosaic ML (opens in a new tab)	MPT-7B (opens in a new tab)	It is a decoder-style transformer that has been pretrained from scratch on 1T tokens of English text and code, available for commercial use, Fast training and inference, Handling long inputs.
Vicuna (opens in a new tab)	LMSYS ORG (opens in a new tab)	Vicuna-7B (opens in a new tab), Vicuna-13B (opens in a new tab), Vicuna-33B (opens in a new tab)	Vicuna is an open-source chatbot that has undergone fine-tuning using user-shared conversations gathered from ShareGPT. In initial assessments, where GPT-4 served as the evaluator, Vicuna-13B demonstrated a quality level of over 90%, surpassing both OpenAI ChatGPT and Google Bard.
Bloom (opens in a new tab)	BigScience (opens in a new tab)	Bloom-176B (opens in a new tab)	An open-source multilingual LLM that can generate text in 46 languages and 13 programming languages. It is the first multilingual LLM trained by more than 1,000 AI researchers, available for research and commercial purposes.
Fuyu (opens in a new tab)	ADEPT (opens in a new tab)	Fuyu-8b (opens in a new tab)	Fuyu-8B, an ADEPT multimodal model, excels at text and image comprehension. It simplifies the traditional transformer architecture, making it easier to grasp, scale, and deploy. Fuyu-8B handles complex visual relationships, including charts and documents, and performs tasks like OCR and text localization in images.
PaLM 2 (opens in a new tab)	Google	PaLM 2 (opens in a new tab)	A superior language model by Google, excels in advanced reasoning, translation, and code generation. The next-gen PaLM is smaller yet more efficient, featuring enhanced performance with faster inference and reduced serving costs. Its diverse multilingual pre-training includes human and programming languages, equations, scientific papers, and web content. With improved architecture and varied task training, PaLM 2 caters to text generation, language translation, creative content creation, and informative question answering. (blog) (opens in a new tab)
Qwen (opens in a new tab)	Alibaba Cloud	Qwen-1.8B (opens in a new tab), Qwen-7B (opens in a new tab), Qwen-14B (opens in a new tab), Qwen-72B (opens in a new tab)	Qwen, Alibaba Cloud's chat and pretrained large language model, showcases robust language models pre-trained on 3 trillion tokens of multilingual data, excelling in diverse tasks like chatting, content creation, information extraction, and more. Aligned with human preference through SFT and RLHF, Qwen delivers competitive performance with a focus on Chinese and English.
DeepSeek-LLM (opens in a new tab)	Deepseek (opens in a new tab)	DeepSeek-LLM 7B Base (opens in a new tab), DeepSeek-LLM 7B Chat (opens in a new tab), DeepSeek-LLM 67B Base (opens in a new tab), DeepSeek-LLM 67B Chat (opens in a new tab)	DeepSeek LLM is an advanced language model with 67-B parameters. Trained from scratch on a massive 2 trillion-token dataset in English and Chinese, it outperforms Llama2 70B Base in reasoning, coding, math, and Chinese comprehension. DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat are open source for research purposes.
Yi (opens in a new tab)	01.AI (opens in a new tab)	Yi-6B (opens in a new tab), Yi-34B (opens in a new tab), Yi-6B-Chat (opens in a new tab), Yi-34B-Chat (opens in a new tab)	Yi, a series of large language models developed by 01.AI, includes bilingual base models (English/Chinese) with sizes Yi-6B and Yi-34B. Trained with 4K sequence length, they can extend to 32K during inference. These models excel in tasks like common-sense reasoning, reading comprehension, math, and code. Benchmarking shows their performance compared to other open-source models.
Phi (opens in a new tab)	Microsoft	Phi-1 (opens in a new tab), Phi-1.5 (opens in a new tab), Phi-2 (opens in a new tab)	Microsoft unveiled "Phi," a series of high-performing small language models. Phi-1 (opens in a new tab), with 1.3 billion parameters, excelled in Python benchmarks. Expanding to common sense reasoning, Phi-1.5 (opens in a new tab), also with 1.3 billion parameters, matched models 5x its size. Phi-2 (opens in a new tab), a 2.7 billion-parameter model, demonstrated outstanding reasoning and language understanding, showcasing state-of-the-art performance among base language models. (paper) (opens in a new tab)

Closed Source Models Tuned Model