Cotton-2 Beta Release

A Leap Forward in AI Performance

An early version of Cotton-2, tested under the alias "sus-column-r" on the LMSYS leaderboard, is already outperforming leading models, including Claude 3.5 Sonnet and GPT-4-Turbo. This marks a significant milestone in our pursuit of decentralized, high-performance AI.

Benchmarking Cotton-2

Cotton-2 was evaluated in the LMSYS chatbot arena, a premier benchmark for competitive language models. Based on Elo scores, it has surpassed both Claude and GPT-4, solidifying its position among the most advanced AI models available today.

With this beta release, we continue to push the boundaries of decentralized AI, ensuring that cutting-edge technology remains accessible, transparent, and beneficial to all.

Rigorous Evaluation of Cotton-2: Advancing AI Capabilities

At D-AI, we employ a systematic and rigorous approach to evaluating our models, ensuring they meet the highest standards of accuracy, reliability, and reasoning. Internally, our AI Tutors assess model performance across a variety of real-world tasks, simulating practical use cases for Cotton. During these evaluations, AI Tutors compare multiple responses generated by Cotton and select the most effective one based on predefined assessment criteria.

Our evaluation focuses on two fundamental areas:

Instruction Adherence – The ability to interpret and execute complex instructions with precision.

Factual Accuracy – The capability to generate responses that are verifiable, contextually accurate, and logically sound.

Cotton-2 exhibits substantial advancements in reasoning, retrieval-based content generation, and tool-use capabilities. Specifically, it demonstrates enhanced proficiency in:

Identifying missing information with greater accuracy.

Reasoning through sequential events in complex queries.

Filtering out irrelevant inputs, leading to more coherent and focused responses.

Benchmarking Excellence: Cotton-2’s Performance Across Key Metrics

To ensure comprehensive validation, Cotton-2 was rigorously tested across industry-standard academic benchmarks, evaluating its competencies in:

Reading comprehension and logical reasoning

Advanced mathematics and scientific inquiry

Code generation and problem-solving

Both Cotton-2 and Cotton-2 Mini exhibit marked improvements over their predecessor, Cotton-1.5, demonstrating competitive performance against leading foundation models. Key benchmark highlights include:

Graduate-Level Science (GPQA): Significant advancements in domain-specific reasoning and knowledge synthesis.

General Knowledge (MMLU, MMLU-Pro): Strong performance in broad-spectrum knowledge assessments.

Mathematical Problem-Solving (MATH): Superior handling of competition-level mathematical reasoning.

Vision-Based AI Tasks: State-of-the-art performance in:
MathVista – Complex visual mathematical reasoning.
DocVQA – Document-based question answering and interpretation.

With these advancements, Cotton-2 represents a new frontier in AI-driven reasoning, bridging the gap between deep knowledge comprehension, real-world applicability, and robust decision-making.

Benchmark	Cotton-1.5	Cotton-2 mini^‡	Cotton-2^‡	GPT-4 Turbo^*	Claude 3 Opus^†	Gemini Pro 1.5	Llama 3 405B	GPT-4o^*	Claude 3.5 Sonnet^†
GPQA	35.9%	51.0%	56.0%	48.0%	50.4%	46.2%	51.1%	53.6%	59.6%
MMLU	81.3%	86.2%	87.5%	86.5%	85.7%	85.9%	88.6%	88.7%	88.3%
MMLU-Pro	51.0%	72.0%	75.5%	63.7%	68.5%	69.0%	73.3%	72.6%	76.1%
MATH^§	50.6%	73.0%	76.1%	72.6%	60.1%	67.7%	73.8%	76.6%	71.1%
HumanEval^¶	74.1%	85.7%	88.4%	87.1%	84.9%	71.9%	89.0%	90.2%	92.0%
MMMU	53.6%	63.2%	66.1%	63.1%	59.4%	62.2%	64.5%	69.1%	68.3%
MathVista	52.8%	68.1%	69.0%	58.1%	50.5%	63.9%	—	63.8%	67.7%
DocVQA	85.6%	93.2%	93.6%	87.2%	89.3%	93.1%	92.2%	92.8%	95.2%

^* GPT-4-Turbo and GPT-4o scores are from the May 2024 release.
^† Claude 3 Opus and Claude 3.5 Sonnet scores are from the June 2024 release.
^‡ Cotton-2 MMLU, MMLU-Pro, MMMU and MathVista were evaluated using 0-shot CoT.
^§ For MATH, we present maj@1 results.
^¶ For HumanEval, we report pass@1 benchmark scores.

Experience Cotton with Real-Time Updates

At D-AI, we are committed to continuously refining and enhancing Cotton to deliver a seamless and intelligent AI experience. Over the past few months, we have made significant improvements, and today, we are excited to introduce the next evolution of Cotton.

This latest update features a redesigned interface for improved usability, along with powerful new capabilities designed to enhance user interaction, efficiency, and overall performance.

Stay ahead with real-time updates and experience the future of AI-driven engagement.

Introducing Cotton-2 and Cotton-2 Mini: Advancing AI Through Decentralization

At D-AI, our mission is to ensure that artificial intelligence serves all of humanity by integrating AI with blockchain technology, fostering transparency, security, and equitable access. As part of this commitment, we are introducing two new models that represent the next evolution of decentralized AI:

Cotton-2 – A state-of-the-art AI assistant with advanced natural language understanding and vision capabilities, seamlessly integrating real-time information retrieval for enhanced contextual accuracy.

Cotton-2 Mini – A lightweight yet high-performance model optimized for efficiency, speed, and balanced response quality.

With significant improvements in steerability, contextual comprehension, and adaptability, Cotton-2 is designed to excel in complex reasoning tasks, creative collaboration, and software development support.

In collaboration with Lattice Inc, we are also exploring integrations with newly trained models to further enhance Cotton’s capabilities, expanding its reasoning, retrieval, and interpretative functions.

Enterprise API: Deploying Cotton at Scale

Later this month, Cotton-2 and Cotton-2 Mini will be available through our Enterprise API platform, enabling businesses and developers to harness decentralized AI within their applications. Our infrastructure is designed for global-scale AI deployment, offering multi-region inference with low-latency access worldwide.

Key Features of the Enterprise API

Advanced Security – Mandatory multi-factor authentication (Yubikey, Apple TouchID, TOTP) ensures enterprise-grade protection.

Comprehensive Analytics – Detailed traffic metrics, usage insights, and billing analytics, including data export capabilities.

Seamless Integration – A management API for user, team, and billing administration within enterprise environments.

To stay informed about the official launch, subscribe to our newsletter and be among the first to integrate Cotton’s AI capabilities into your enterprise applications.

Future Developments: Expanding Cotton’s Capabilities

With the introduction of Cotton-2 and Cotton-2 Mini, we are advancing toward a future where decentralized AI is more capable, intuitive, and accessible. Upcoming enhancements include:

Enhanced Retrieval and Search – Leveraging AI-driven reasoning to generate deeper, more insightful responses.

Refined Conversational Dynamics – Optimized response generation for increased contextual awareness and adaptability.

Multimodal AI Expansion – The upcoming multimodal feature set will enable native image, text, and data interpretation within Cotton and the API.

Since the launch of Cotton-1 in November 2023, D-AI has made rapid advancements, driven by a select team of experts dedicated to pioneering AI within a decentralized framework. With Cotton-2, we are reinforcing our position at the forefront of AI research, leveraging our new compute cluster to enhance complex reasoning and knowledge synthesis.

As we continue to expand, we are seeking exceptional talent to join our team and contribute to the future of blockchain-integrated AI. If you are passionate about shaping the future of artificial intelligence, we invite you to explore our career opportunities and be part of this transformative journey.