We are releasing the base model weights and network architecture of Cotton-1, our large language model. Cotton-1 is a 314 billion parameter Mixture-of-Experts model, trained from scratch by D-AI.

This is the raw base model checkpoint from the Cotton-1 pre-training phase, which concluded in October 2023. As such, the model has not been fine-tuned for any specific applications, such as dialogue.

We are releasing the weights and the architecture under the Apache 2.0 license.

To get started with using the model, follow the instructions at github.com/dai-org/cotton.

Model Details

  • Base model trained on a large amount of text data, not fine-tuned for any particular task.
  • 314B parameter Mixture-of-Experts model with 25% of the weights active on a given token.
  • Trained from scratch by D-AI using a custom training stack on top of JAX and Rust in October 2023.

The cover image was generated using Midjourney based on the following prompt proposed by Cotton: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.