Learning to Explain Causal Rationale of Stock Price Changes in Financial Reports

Introduction

Background

Bidirectional Encoder Representations from Transformers (BERT)

Traditional word embedding representations such as word2vec (Mikolov et al., 2013) or GloVe (Pennington et al., 2014) are generated by context-free models, therefore a word that has two different meanings still has the same representation in the traditional word embedding. BERT is an unsupervised language representation pre-trained on a large corpus, including the entire English Wikipedia and book corpus (Y. Zhu et al., 2015). As a result, the pre-trained BERT shows state-of-the-art performance on many Natural Language Processing (NLP) tasks.

So, for a specific NLP task, we only need to refine the pre-trained BERT by adding just a few additional output layers.

Semi-supervised Learning

Local Interpretable Model-Agnostic Explanations (LIME)

The MD&A provides the company's perspective on its operations and financial results from the previous year. Finance Mortgage investment income decreased in 1995 compared to 1994, primarily due to the transfer to HUD of the mortgage on El Lago Apartments in June 1995. Manufacturing Because components are sold directly to the Company's manufacturing sources, the Company is not in the of the precise quantities coming from certain suppliers.

Energy Due to the apparent age of the material, no fine or enforcement action is expected. Finances In the past three years, inflation has not had a significant impact on the Company due to the relatively low inflation rate. Then we combine labeled data and pseudo-labeled data from the sector to refine 1st refined models for a specific sector.

This is due to the unbalanced ratio of causes and non-causes in our CR-SPC dataset. This is mainly due to the size of the data for the sectors and the characteristics of each sector. The results of LIME on the first and second tuning models show that two models resemble each other, this is due to the distilled knowledge of the first stage of the two-stage fine-tuning.

However, our tests depend heavily on our data set, and there is a possibility of any bias in judging the main reasons for stock price changes. Extracting causal information from PDF files of summary financial statements of companies.

Table 1. Dataset (CR-SPC) Composition: Number of sentences and documents in each sector

Related Work

Causal Rationale of Stock Price Changes Dataset (CR-SPC)

Management’s Discussion and Analysis (MD&A)

Therefore, this section is the main source of information about the causes of their financial results during the past year.

Sentence Extraction

Data Annotation of Industrial Categories

Consumer Durables The increase in sales in the 1996 financial year was mainly due to improved sales of buses and ambulances. Chemicals The margin loss was primarily due to decreases in sales prices and increases in raw material prices in the pyridine and related businesses, and higher production costs due to weather-related problems in the first quarter of 1994. Research and development costs for equipment for equipment increased from $10.1 million (2% of net revenues) in fiscal 1995 to $34.6 million (21% of net revenues) in fiscal 1996 due to the increase in software development resulting from the acquisition of the three software studios in calendar 1995.

Utilities Gas operating income increased $36.7 million, or 21.0%, due to increased volumes resulting from increased customers and higher gas costs. Business equipment Due to a variety of factors, including changes in the performance of the relational database product on wide area networks, changes in the speed of various communication links, changes in the performance of the hardware platform and other factors , there is a limited ability to accurately predict product performance under some of these environments. Utilities The remainder of the increase is attributable to increased ad valorem taxes, repair and maintenance expenses primarily related to WCLSF and the employee incentive plan, which rewards certain Tejas employees with bonuses when the company achieves certain annual financial growth targets .

Health's 1995 results were also adversely affected by a reduction in the Company's income tax benefit resulting from reserves created in connection with the expiration of certain state operating loss carryforwards.

Annotator Sensitivity

The first refinement models are basic BERT models that are tuned to all sector data except a specific sector. Comparison between 'Sector-only', '1st Fine-tune' and '2nd Fine-tune' model (left: area under the ROC curve (AUC), right: average precision (AP)). As shown in Table 5 and Figure 4, all 2nd fine-tuning models achieved improved AUC and AP scores compared to Sector-only models.

In sector 12, 1st fine-tuning models achieve the lowest performance of both AUC and AP compared to the other models. The probability of a causal sentence from sector 5 is predicted to be highest with the 2nd fine-tuning model compared to only the sector and 1st fine-tuning models. Interpretation of predictions from (A) 'Sector only', (B) '1. fine tuning' and (C) '2. fine tuning' models with LIME.

However, some words appear or disappear in 2nd fine-tuning models, showing the advantage of using the two-stage fine-tuning framework to extract causal rationales.

Methods to Train Causal Rationale

Two-stage Fine-tuning

This section tests a hypothesis whether two-stage fine-tuning helps to predict the different causal reasons of stock price changes for each sector better than using sector-only data. However, the number of data for each industry sector is between 5K and 60K, in which some sectors do not contain enough data to precisely reveal causal reasoning. In a two-stage tuning network, we first tune a pre-trained NLM with all sector data except for one class that we want to train eventually.

In this way, we can overcome the lack of data and ensure that NLMs effectively learn global features and domain features together.

Semi-supervision using Self-training

At each step, we randomly select a set of training data by scaling 10,000 sentences, and then report the mean and standard deviation of AUC and AP based on a tenfold cross-validation of the rest of the CR-SPC dataset. The differences between the 2nd fine-tune models and the Sector-only models are significantly larger in Sectors 2 and 5, by 19.96% and 24.14% in AP, respectively. Moreover, we see that the micro-average performance of second fine-tuning models is higher than first fine-tuning models, with increases of 0.19% and 1.4% in AUC and AP, respectively.

To confirm the quality of pseudo-labels, we train 1st fine-tuned model on the low-quality labeled data combined with high-quality labeled data (CR-SPC) of the sector, and we call this model a surveillance- low model. The last part of the summaries consists of sentences classified as rationale of changes in stock prices from attentive Bi-LSTM trained on CR-SPC dataset. Predictions from the 1st-fit model and 2nd-fit model are correct answers that are improved by 'sales' and 'increase'.

Proceedings of the Annual Conference of the North American Section of the Society for Computational Linguistics, 272–280.

Experimental Results

Experimental Settings

We use Area Under the ROC Curve (AUC) and Average Precision (AP) of causal sentences as evaluation metrics. As input, a single sentence of length 100 is tokenized, with Keras tokenizer (Chollet & others, 2015) and BERT tokenizer (Devlin et al., 2018) for Long Short-Term Memory (LSTM) (Hochreiter & Schmidhuber), respectively. 1997) based models and BERT Base.

Few Shot Training

We conduct experiments on our CR-SPC dataset consisting of sentences and corresponding labels in supervised learning and the combination of labeled 283K and unlabeled 382K sentences in semi-supervised learning. AUC (gray) and average precision (black) for BERT base with the number of training data from our CR-SPC dataset.

Baseline Models

Two-stage Fine-tuning

We perform one-tailed t-tests to determine the statistical significance of the difference in performances of AP (p <.05) and AUC (p <.01).

Semi-supervision

Turing Test

The increase in gross margin was due to several factors, including (1) an increase in gross sales, (2) reduced transportation and manufacturing labor costs, and (3) improved raw material and packaging supply costs, including taking advantage of increased volume. The increase in gross margin was driven by several factors, including (1) significant increases in gross and net sales, (2) significantly increased volume and efficiency resulting in lower freight and transportation costs, and (3) improvements in certain of their key manufacturing processes products that led to lower overall production costs. Significant changes in financial performance are explained as follows: The Company's technology, services and licensing revenue increased 5% in 2016 compared to 2015, driven by stronger sales of the Company's digital authentication solution, partially offset by a reduction in the company's IT hardware resale activities, resulting from the company's reduced focus on this component of its digital business.

Cost of revenue increased 4% in 2016 compared to 2015, less than the company's 10% increase in revenue over the same period, which generally reflected increased sales of higher-margin products such as security and technology sales. card sales, so that material costs, external service costs and delivery costs decreased as a percentage of revenue in 2016. Stock-based compensation expense decreased by 66% in 2016 compared to 2015 due to a general decrease in the number and value of stock-based compensation granted by the Company since 2014. Their financial condition was affected by a number of reasons, such as: of sales through the wholesale channel was driven by lower sales of Liberator products to retailers, partially offset by higher sales of Liberator, Jaxx and Avana products through and to Amazon.

Other income (expenses) increased by 16% compared to the previous year due to higher average loan balances and higher interest costs on those larger balances.

Figure 5. Responses of participants in Turing Test.

Interpretation of Causal Rationale Detection

In this work, we create a large-scale data set of causal reasoning of stock price changes (CR-SPC) to automatically extract causal reasoning from financial reports in different sectors. We found that our two-stage tuning improves the performance of causal inference models trained on sectors with a small number of data. Finally, we apply LIME to our tuned two-stage model and other models to compare the improvements qualitatively.

Proceedings of Annual Meeting on Association for Computational Linguistics Workshop on Multilingual Summarization and Question Answering-Volume 12, 76–83.

Figure 6. Interpretation of predictions from (A) ‘Sector-only’, (B) ‘1st-fine-tune’ and (C) ‘2nd-fine- ‘2nd-fine-tune’ models with LIME

Discussions

Conclusion and Future Work