TEXT STRUCTURING AND CATEGORIZATION WHEN SUMMARIZING LEGAL CASES
2. TEXT CORPUS AND OUTPUT OF THE SYSTEM
An expert in criminal law studied a sample of Belgian criminal cases (Uyttendaele, Moens, & Dumortier, 1996, 1998). This analysis resulted in a detailed description of the categories, structure, and the parts of the case that are relevant to include in its summary.
Belgian criminal cases can be classified into 7 main categories, distinguishing general decisions from particular ones. The latter concern appeal procedures, civil interests, refusals to witness, false translations by interpreters, infringements by foreigners, or the internment of people.
The criminal cases have a typical form of discourse (superstructure).
They are made up of 9 ordered elements, some of which are optional:
1. superscription, containing the name of the court and the date;
2. identification of thevictim;
3. identification of theaccused;
4. alleged offences, describing the crimes and factual evidence;
5. transition formulation, marking the transition to the grounds of the case;
6. opinion of the court, containing the arguments of the court to support its decision;
7. legal foundations, containing statutory provisions applied by the court;
8. verdict;
9. conclusion, possibly containing the name of the court and the date.
Some of these components have an interesting substructure (e.g., date and name of the court in the superscription, irrelevant paragraphs in the alleged offences, irrelevant paragraphs in the opinion of the court, irrelevant foundations in the legal foundations). In total we defined 14 different case components or segments relevant for abstracting purposes, some of them being subsegments of larger text segments. The segments present themselves in the text as: text blocks delimited or categorized by typical word patterns (e.g., the transition formulation), texts blocks preceding and/or following another text segment (e.g., identification of the victim), text paragraphs delimited or characterized by typical word patterns (e.g., irrelevant paragraphs in the alleged offences), text sentences delimited or characterized by typical word patterns (e.g., irrelevant foundations), or plain word patterns (e.g., name of the court). A word pattern is a combination of one or more text strings.
The most relevant parts of a case are the alleged offences, the opinion of the court, and the legal foundations. Thealleged offencesgive a description of the crimes a person is accused of. The opinion of the court allows distinguishing three types of cases within the studied corpus: routine cases (containing only routine, unimportant grounds in their opinion), non-routine cases(containing other than routine-grounds), and leading cases (containing more than 5 “principle grounds”). Principle grounds are the paragraphs of the opinion in which the court gives general, abstract information about the application and the interpretation of some statutes. The routine and the leading cases represent 35% to 40% and 3% to 5% of the total corpus respectively. In the non-routine cases, the judge elaborates the crime themes, taking into account the factual evidence and, in case of leading cases, the application of specific statutes. The legal foundations consist of a complete enumeration of legal texts and articles applied by the court. Several of these foundations ( routine foundations ) are cited in each case, while others concern the essence of the case.
After examining intellectually constructed headnotes of printed law reports, it was decided that it would be interesting to extract the following information from the case:
1. Thename of the court that pronounced the decision;
2. thedate of the decision;
3. the key paragraphs that describe the crimescommitted;
4. the key paragraphs and terms that appear to express the essence of the opinion of the court;
5. references to the applied non-routinefoundations.
Figure 1. Architecture of the SALOMON demonstrator.
To realize the goals, a demonstrator (Figure 1) is built in the programming language C on a Sun™ SPARC station 5 under Solaris® 2.5.1.
It produces a summary (“index card”) of a criminal case.
The expert in criminal law interviewed other experts in the field and people responsible for the publication and manual summarization of cases in professional journals. When intellectually abstracting, an initial step regards the identification of the case category, of semantically relevant components, and of insignificant text segments. Similarly, our automatic abstracting procedure consists oftwo steps.
Thefirst step identifies the general category and the structure of the case.
Also, irrelevant parts of the text of the alleged offences and opinion of the court, and routine foundations are identified. Here, the linguistic context of the information is predictable and the cases are processed based upon a representation of the texts that captures the syntax and semantics of the discourse. The result of the initial categorization and text structuring is a
case tagged in SGML(Standard Generalized Markup Language)-syntax. A head tag marks the general category of the case. The identified text segments are marked with the appropriate category tags. Some records on the index card (such as date, name of the court, and non-routine legal foundations) can be readily extracted from this structured text. In thesecondabstracting step, the system further summarizes the relevant parts of the alleged offences and the opinion of the court. The remainder of this chapter discusses the first step of the abstracting process.