Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Improving Document Layout Analysis Using Synthetic Data Generation and Convolutional Models
0
Zitationen
6
Autoren
2026
Jahr
Abstract
Document Layout Analysis (DLA) is a critical step in intelligent document processing and is essential for accurately reconstructing the hierarchical structure of pages. While modern convolutional neural networks exhibit high performance, their effectiveness heavily depends on the quality and representativeness of training data, limiting their application in scenarios where labeled datasets are scarce. This paper proposes a method for enhancing DLA through synthetic generation of training data. A formalized mathematical model for generating document layouts has been developed, allowing control over element placement density, sizes, and spatial distribution. An experimental study investigated the impact of various data generation strategies on the training of the YOLO11m model, including median and threshold-based element splitting as well as different block sampling schemes. The experiments showed that employing median element splitting combined with random sampling from a large shuffled pool of synthetic data yields consistent improvements of 2–4% across all key metrics: precision, recall, mAP@50, and mAP@50:95, as compared with simple data generation strategies. These results demonstrate that targeted optimization of the data preparation process can enhance the performance of convolutional models in DLA tasks without increasing architectural complexity. The practical applicability of the method is validated through integration into the MinerU system. Future research will focus on extending the proposed model to complex layouts in scientific journals, technical reports, and handwritten documents.
Ähnliche Arbeiten
Gradient-based learning applied to document recognition
1998 · 56.806 Zit.
Backpropagation Applied to Handwritten Zip Code Recognition
1989 · 11.683 Zit.
Visual pattern recognition by moment invariants
1962 · 7.487 Zit.
Statistical pattern recognition: a review
2000 · 6.712 Zit.
LSTM: A Search Space Odyssey
2016 · 6.612 Zit.