Microsoft released three new internal AI models for text, speech, and graphics generation

Microsoft released three new internal AI models for text, speech, and graphics generation

17 hardware

Microsoft AI launches three new multimodal models

In an effort to strengthen its position in artificial intelligence (AI), Microsoft AI’s research division announced the release of three proprietary models capable of generating text, audio, and images. This move was a response to competition from leading AI labs.

ModelPurposeKey metrics
MAI‑Transcribe‑1Converts speech to text25 languages, 2.5× faster than Azure Fast
MAI‑Voice‑1Creates an audio trackOne minute in one second, voice tuning
MAI‑Image‑2Generates images from text

The project was developed by the MAI Superintelligence team—a division focused on fundamental research into advanced AI systems. In November 2025, executive director Mustafa Suleyman joined the team.

Cost efficiency Developers placed special emphasis on reducing compute costs compared to Google and OpenAI counterparts:

ServicePrice
Text transcription$0.36/hour
Speech synthesis$22 per 1 million characters
Image processing$5 per 1 million input tokens; $33 for generating 1 million output tokens

The models are already deployed on the Microsoft Foundry platform. Transcription and speech synthesis are available in MAI Playground.

Partnership with OpenAI Despite actively developing its own solutions, Mustafa Suleyman confirmed a commitment to collaborating with OpenAI: Microsoft has already invested over $13 billion. The company will continue using OpenAI models in its products under a long‑term contract, applying a diversification strategy similar to its work with microchips.

Thus, Microsoft AI is strengthening its market position by offering fast and cost‑effective multimodal solutions while maintaining close ties with key partners.

Comments (0)

Share your thoughts — please be polite and stay on topic.

No comments yet. Leave a comment — share your opinion!

To leave a comment, please log in.

Log in to comment