With MosaicML, you need fewer parameters for the same level of performance as GPT-3, which is more efficient.

 

MosaicML, an open-source provider of large language models (LLMs), has unveiled its latest models, namely the MPT-30B Base, Instruct, and Chat. These cutting-edge models have been trained on the MosaicML Platform using NVIDIA's state-of-the-art H100 accelerators, offering superior quality compared to the original GPT-3 model. The MPT-30B models enable businesses to leverage the power of generative AI while ensuring data privacy and security. Since their launch in May 2023, the MPT-7B models have gained significant traction, with over 3.3 million downloads. The newly released MPT-30B models raise the bar even higher, providing enhanced quality and unlocking new possibilities across various applications. MosaicML's MPT models are designed for efficient training and inference, enabling developers to build and deploy enterprise-grade models with ease. Notably, the MPT-30B model surpasses the quality of GPT-3 while utilizing only 30 billion parameters, compared to GPT-3's 175 billion. This makes MPT-30B more accessible to run on local hardware and significantly reduces deployment costs for inference.


Training custom models based on MPT-30B is also much more cost-effective than training the original GPT-3, making it an appealing choice for enterprises. Additionally, MPT-30B was trained on longer sequences of up to 8,000 tokens, enabling it to handle data-intensive enterprise applications. The performance of MPT-30B is further enhanced by NVIDIA's H100 GPUs, delivering increased throughput and faster training times. Several companies have already embraced MosaicML's MPT models for their AI applications. For example, Replit, a web-based IDE, utilized their proprietary data and MosaicML's training platform to build a code generation model, resulting in improved code quality, speed, and cost-effectiveness. Scatter Lab, an AI startup specializing in chatbot development, trained their own MPT model to create a multilingual generative AI model capable of understanding English and Korean, enhancing chat experiences for their users. Navan, a global travel and expense management software company, is leveraging the MPT foundation to develop custom LLMs for applications such as virtual travel agents and conversational business intelligence agents. Ilan Twig, Co-Founder and CTO at Navan, expressed their satisfaction with MosaicML's foundation models, highlighting their state-of-the-art language capabilities and efficiency for fine-tuning and serving inference at scale. Developers can access MPT-30B as an open-source model through the HuggingFace Hub. They have the flexibility to fine-tune the model on their data and deploy it for inference on their infrastructure. Alternatively, developers can opt for MosaicML's managed endpoint, MPT-30B-Instruct, which offers hassle-free model inference at a fraction of the cost compared to similar endpoints. Priced at $0.005 per 1,000 tokens, MPT-30B-Instruct provides a cost-effective solution for developers. MosaicML's release of the MPT-30B models represents a significant milestone in the realm of large language models, empowering businesses to harness the capabilities of generative AI while optimizing costs and retaining control over their data.

(Photo by Joshua Golde on Unsplash)

Comments