OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 30.03.2026, 19:28

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Degradation of Multi-Task Prompting Across Six NLP Tasks and LLM Families

2025·2 Zitationen·ElectronicsOpen Access
Volltext beim Verlag öffnen

2

Zitationen

2

Autoren

2025

Jahr

Abstract

This study investigates how increasing prompt complexity affects the performance of Large Language Models (LLMs) across multiple Natural Language Processing (NLP) tasks. We introduce an incremental evaluation framework where six tasks—JSON formatting, English-Italian translation, sentiment analysis, emotion classification, topic extraction, and named entity recognition—are progressively combined within a single prompt. Six representative open-source LLMs from different families (Llama 3.1 8B, Gemma 3 4B, Mistral 7B, Qwen3 4B, Granite 3.1 3B, and DeepSeek R1 7B) were systematically evaluated using local inference environments to ensure reproducibility. Results show that performance degradation is highly architecture-dependent: while Qwen3 4B maintained stable performance across all tasks, Gemma 3 4B and Granite 3.1 3B exhibited severe collapses in fine-grained semantic tasks. Interestingly, some models (e.g., Llama 3.1 8B and DeepSeek R1 7B) demonstrated positive transfer effects, improving in certain tasks under multitask conditions. Statistical analyses confirmed significant differences across models for structured and semantic tasks, highlighting the absence of a universal degradation rule. These findings suggest that multitask prompting resilience is shaped more by architectural design than by model size alone, and they motivate adaptive, model-specific strategies for prompt composition in complex NLP applications.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Topic ModelingArtificial Intelligence in Healthcare and EducationExplainable Artificial Intelligence (XAI)
Volltext beim Verlag öffnen