AI Learns to Sort Legal Documents by Case Type

PLOS ONE 5 Jun 2026 2 min read

AI Insight

This study developed and tested machine learning and deep learning methods to automatically classify legal documents from a real-world dataset containing thousands of complex legal texts. An ensemble learning model using Extremely Randomized Trees achieved 89% accuracy, while the best performance of 96% accuracy was obtained using sentence embeddings combined with Long Short-Term Memory neural networks. The research demonstrates that advanced natural language processing techniques can effectively handle the challenging task of understanding and categorizing legal language.

Why it matters

Automated legal document classification can significantly reduce the time and effort required for organizing and analyzing large volumes of legal texts, improving efficiency in legal systems. The high accuracy achieved suggests practical applications in streamlining document review, case management, and legal research while reducing human error and enabling faster access to justice.

Confidence

6/10Peer-reviewedInterdisciplinary

by Fawaz Khaled Alarfaj

The justice system is indispensable to any society as it enforces the rule of law, safeguards fundamental rights, and ensures the equitable resolution of disputes through structured legal frameworks. Artificial Intelligence (AI) has significantly advanced the legal and justice system by automating time-intensive tasks such as document review and contract analysis, thereby enhancing efficiency and reducing human error. Additionally, AI-powered predictive analytics and decision support systems have improved access to justice by providing data-driven insights, enabling faster case resolution, and ensuring more consistent application of the law. Legal document classification using AI techniques is imperative as it enables efficient organization, retrieval, and analysis of vast volumes of legal texts, enhancing accuracy, reducing manual effort, and facilitating faster decision-making in legal processes. In this research study, the main aim is to classify legal text documents using Machine Learning (ML) and state-of-the-art Deep Learning (DL) algorithms. Using a real-world dataset that consists of thousands of legal documents having complex language related to legal cases poses a challenging natural language understanding task by applying various textual features, deep features, and advanced sentence embeddings. The results reveal that the ensemble learning model of Extremely Randomized Trees shows better results with 89% accuracy, as it aggregates the results of multiple decorrelated decision trees to enhance predictive accuracy and control over-fitting. However, the best results of 96% are achieved with sentence embeddings. Sentence embeddings with Long Short-Term Memory (LSTM) networks are highly effective in Natural Language Processing (NLP) due to their ability to capture complex semantic and syntactic information within text.

Source: Focusing on legal cases: Automatic classification of legal documents with sentence embeddings and deep learning models