Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Fine-Tuned Generative AI for Automated Structured Data Extraction and Insight Generation from Legacy Petroleum Well Reports: An Egyptian Oilfields Case Study
1
Zitationen
2
Autoren
2025
Jahr
Abstract
Abstract This paper introduces a workflow based on fine-tuned generative AI to automate the extraction of structured data and insights from legacy petroleum well reports. It replaces tedious manual analysis of documents (e.g., Daily Drilling Reports and Workover Reports) with an automated process that loops through the data and outputs a structured, queryable dataset. The scope includes diverse operational reports from Egyptian oilfields, demonstrating practical applications for improved resource management. The approach consists of three main stages. First, operational reports were collected from various Egyptian oil fields in formats such as PDF, Excel, and Word. Second, a foundational large language model was used to parse and interpret the content, producing a clean, domain-specific, labeled dataset. Third, this dataset was used to fine-tune compact generative AI models (1.5B–7B parameters) for local deployment. These models were trained to perform tasks such as jargon translation, report summarization, title generation, and extracting key information on well history, operational problems, and their solutions, enabling structured, consistent datasets and actionable insights. An initial assessment by domain experts indicates that the fine-tuned local model (7B parameters) achieves reliable results. The performance and validation were based mainly on real operational reports, with the extracted and processed information verified by domain experts, providing more reliable validation than evaluation metrics. The AI-driven approach significantly reduces processing time compared to manual analysis while maintaining consistency in the extracted structured data, making it suitable for our main objective that is automation. Its capacity to understand technical jargon and generalize across various report formats depends heavily on the quality of the fine-tuning dataset. Achieving these results with the fine-tuned 7B-parameter model suggests strong potential for even better performance and generalization by tuning larger models and collecting more high-quality, informative datasets. This research presents a novel two-stage AI approach for the energy sector. First, large language models extract and structure a domain-specific, curated dataset from complex well reports. Then, this dataset is used to fine-tune a smaller model optimized for fast, local deployment. The workflow offers a scalable solution for building specialized AI tools that deliver specialized, accurate results. It marks a forward step in applying generative AI and LLMs to petroleum data analytics.