News dalla rete ITA

14 Maggio 2025

Corea del Sud

KAKAO OFFERS GLIMPSE INTO ITS AI MODEL'S PERFORMANCE

Multimodal language model Kanana-o simultaneously processes audio, text, video Kakao on Thursday showcased the performance of its highly anticipated artificial intelligence (AI) model, Kanana, introducing Korea’s first multimodal large language model (LLM) that can simultaneously understand and process text, voice and images. On its official tech blog, the online platform giant unveiled the performance report of the multimodal model, known as Kanana-o. Kakao said Kanana-o is the country’s first AI model capable of understanding and processing various types of information simultaneously, including text, voice, and images. Users can input questions in any combination of the three, and the model generates a response with the appropriate text or voice based on the situation. Kanana-o was trained by learning image, audio and text data simultaneously, and uses speech emotion recognition technology to correctly interpret user intentions and analyze non-verbal signals, such as intonation, speech patterns, and voice trembling. This enables the model to generate contextually appropriate and coherent responses in a natural, human-like voice, the company noted. The multimodal model achieved a similar level of performance in Korean and English benchmarks compared to other top global LLMs, while significantly outperforming its peers in Korean benchmarks. In particular, it demonstrated exceptional emotional recognition capabilities in both Korean and English compared to other LLMs, taking a step forward in making an AI model that can understand sentiment in communication, Kakao said. With large-scale Korean datasets, it also precisely interprets speech structures and intonation that are specific to Korean, allowing it to convert regional dialects, such as those from Jeju Island and the Gyeongsang Provinces, into standard Korean. "Kanana models are evolving from text-centric AI into an AI that sees, hears, speaks and empathizes like humans by processing complex types of information integrally," Kim Byung-hak, the head of Kanana Alpha team at Kakao, said. "We plan to strengthen our competitiveness in the AI industry based on our unique multimodal technology while continuously sharing our research results to contribute to the evolution of Korea’s AI ecosystem." After unveiling the full pipeline for the Kanana series development, Kakao has been sharing details on the models' performance and development logs through its official tech blog. In February, the company released the open-source version of the Kanana Nano 2.1B model on GitHub, and also published a technical report on Kanana's research outcomes on arXiv. “The goal is to innovate user experiences in multi-voice conversation environments and to obtain technology that allows natural interactions close to human conversations,” the company said.   (ICE SEOUL)


Fonte notizia: The Korea Times