Readability of MD&A extracted from iXBRL: Computational linguistic approach∗

(1)

Readability of MD&A extracted from iXBRL:

Computational linguistic approach

∗

Yoshitaka Hirose

†

_{Hirohisa Hirai}

‡

_{Kohei Arai}

§

May 27, 2017, Ver.1.0

Abstract

This paper clarifies determinants of the readability of Management and Discussion & Analysis (MD&A) section from annual reports of Japanese companies extracted from Inline Extensible Business Reporting Language (iXBRL). Previous studies have fo-cused on English-language information, with no studies discussing the characteristics of Japanese MD&A using large-sample data. Thus, we extracted the character in-formation Japanese companies from iXBRL and analysed the readability using text mining. We found that 1) companies with large market value at the end of the term and companies with a high age have low readability, 2) companies with a large market value at the end of the term and companies with many foreign segments have many characters, and 3) companies with high age have fewer characters. Further, MD&A in Japan had greater readability than comparable United States documents. Our results suggest that firms with asymmetric information use simpler words for shareholders and, further, are conscious of shareholders who have poor Japanese. The academic contribution of this paper is to show the usefulness of iXBRL as well as the readability of Japanese MD&A using large-sample data through a computational linguistic ap-proach. In addition, this research compares the results of Li (2008), which targeted the United States, with the results for Japan.

Keywords: iXBRL, MD&A, readability, textual analysis

JEL Codes: M41, M48, D82, G14, M15

∗_{Corresponding author: Yoshitaka Hirose ([email protected])} †_{Takasaki University of Commerce Junior College}

(2)

1 Introduction

This paper aims to clarify the character information of MD&A (Management and Discussion & Analysis) section in Japanese annual reports. The MD&A disclosed in the “Management Discussion and Analysis on Financial Condition and Results of Operations” sections in an-nual reports. The choice and coverage of content is at the discretion of management, and some reports include future projections or plans. Although disclosure was institutionalised in 2003, there is limited research on its contents. Li (2008) conducted a typical empirical study of MD&A, which revealed that the readability of the MD&A section of the annual report can be used to predict future performance. According to Li (2008), the readability components are difficulty and length. Specifically, the article analysed the Fog Index to measure the difficulty level and the length of sentences. The paper found that 1) companies with low profit margins are preparing highly readable annual reports, and 2) low readability is posi-tively correlated with future performance. Since Li (2008), readability continues to be used to assess the qualitative disclosure quality of financial statements (Lang, Stice-Lawrence, 2015; Lee, 2010; Lawrence, 2011 etc.). Previous studies have targeted English-language information, with no studies yet discussing the characteristics of Japanese MD&A using large-sample data. Recently, Shibasaki and Tamaoka (2010) developed a judgment model of Japanese difficulty (grade), which supports similar investigation of large samples of MD&A from Japanese companies. Thus, with reference to Li (2008), this paper conducts replication tests on Japanese MD&A. The academic contribution of this paper is to show the usefulness of Inline Extensible Business Reporting Language (iXBRL), as well as assessing the read-ability of Japanese MD&A using large-sample data via a computational linguistic approach. In addition, we compare the results of Li (2008), which targeted the United States (US), with the results for Japan.

The remainder of this paper is organised as follows. Section 2 sets up hypotheses to be verified in this paper on the readability of Japanese MD&A. We also present the empirical model used to verify the hypotheses. Section 3 describes the research design. Section 4

(3)

explains the data and sample selection used in this paper and presents the descriptive statis-tics. Section 5 verifies the readability features of Japanese MD&A according to Li (2008). Finally, section 6 summarises and discusses future directions.

2 Literature Review

(4)

and the length of sentences. The study found that 1) companies with low profit margins are preparing low readability annual reports, 2) companies that disclose easy-to-read annual reports are earning persistence. The result supports the hypothesis that managers create complicated annual reports to hide current low performance. In other words, it suggests that managers fogging information that is disadvantageous to investors and create opportunistic annual reports. Li (2008) contributed the following three things. First, the study expanded strategic disclosure research by analysing large-sample data on readability and showed that it can be valuable in verifying whether readability is related to profit and profit sustainability. Second, more complicated annual reports have low-quality disclosure that increases investor information processing costs. Third, the quality of disclosure as character information is related to the sustainability of profits. Since Li (2008), readability continues to be used to assess qualitative disclosure quality in financial statements (Lang and Stice-Lawrence, 2015; Lee, 2010; Lawrence, 2011 etc.). However, these previous studies have obtained text data from databases such as Compustat and Osiris. In contrast, our research, text data was obtained from iXBRL.

3 Research design

In this section, we present a hypothesis following Li (2008) regarding the readability of MD&A. For that purpose, we first reviewed previous studies on the readability components: difficulty and length, and describe several resulting hypotheses.

3.1 Readability: Difficulty and Length

(5)

we looked to Shibasaki and Tamaoka (2010), who developed a grade determination formula based on Japanese language textbooks. Specifically, the authors conducted a multiple re-gression analysis with various predicted factors was performed with Japanese difficulty as the independent variable and the subject grade of the textbook as the dependent variable. Then, after executing variable selection using a step-wise method, the authors argued that the two variables proportion of hiragana in the whole text and average predicate number for one sentence, which we will discuss in more detail below, are effective as independent variables. Equation (1) below is the grade determination formula calculated as a regression equation. In the formula, Grade represents the grade or difficulty, which correlates to a school reading level. For example, if the grade level is 1, the degree of difficulty is at the first grade level of the elementary school, and if it is 9, it is at the difficulty degree of the 3rd grade level of the junior high school. X1 represents the proportion of hiragana characters in

the whole text (unit is %) and X₂ represents average predicate number of one sentence:

Grade=−0.145X₁+ 0.587X₂+ 14.016, (1)

RegardingX1, the linguistic features of Japanese characters include the four character types:

kanji, hiragana, katakana, and romaji. According to Shibasaki and Tamaka (2010), the presence of four kinds of characters in one language is a feature not found in other languages. Therefore, they assumed that the proportion of character types is one of the variables that determines the difficulty level. Their analysis of 205 major texts compared with a standard textbook of Japanese language indicated that hiragana decreases and kanji increases as the grade increases. The former is included in the grade determination formula. That is, these variables are effective for measuring the difficulty level.

(6)

the complexity of the grammatical structure. Next, we consider the readability component of Length by measuring the total number of characters of MD&A. A document with a large total number of characters is considered difficult to read because the reader’s information processing cost is high. In this paper, similar to Li (2008), the total number of characters of MD&A in the annual report is defined as:

Length=log(N Characters), (2)

Similar to Li (2008), because there is distortion in the distribution of the number of char-acters, we use logarithmically transformed values for analysis. In contrast to Li (2008) we consider the number of characters rather than words, as this better accounts for Japanese language characteristics.

3.2 Set hypothesis: Determining factors of readability

(7)

of readability of the annual report; SIZE is the logarithm of the year-end market value:

Hypothesis 1 The size of a company is inversely correlated with readability. That is, larger

companies have lower readability of MD&A than smaller companies.

Companies with high MTB are different to those with low MTB in many ways (e.g. investment opportunities and growth potential). Growing companies with highMTB have a more complex and uncertain business model, leading to hypothesis 2 whereMTB is (market price of net assets + book value of liabilities) / (total book value of assets):

Hypothesis 2 MTB is inversely correlated with readability. That is, companies with a high

market to book value ratio have lower readability of MD&A than companies with low market

to book value ratio.

Established companies with high AGE have implemented disclosure in the stock market over a long period. Therefore, the company has low information asymmetry and uncertainty of information, and readability of annual reports is expected to be good. If the investor has the correct information on the business model of the long-living company after listing, the company may disclose a simple and readable annual report. Therefore, hypothesis 3 is set:

Hypothesis 3 AGE is correlated with readability. That is, companies with a high age have

more readable MD&A than companies with lower age.

Companies with large special extraordinary profit / loss items such as SI P and SI N

experience more abnormal events. Therefore, such companies have complicated information to disclose in their annual reports, leading to hypothesis 4:

Hypothesis 4 Extraordinary profits and loss are inversely correlated with readability. That

is, companies with many extraordinary profits and losses have lower readability of MD&A

(8)

Companies with many NBSEG, NGSEG, and NFSEG are conducting regional develop-ment, business diversification, overseas expansion and so forth. Thus, it is assumed that the company is doing more complicated business, leading to more complex disclosure in annual reports. Therefore, hypothesis 5 is set:

Hypothesis 5 NBSEG, NGSEG, and NFSEG are inversely correlated with readability. That

is, companies performing complex business have lower readability of MD&A than companies

that do not conduct complicated business.

EARN VOL is considered to be a proxy variable of a company in an unstable business environment. Companies are expected to have more complicated disclosures to investors as the uncertainty of the business environment increases, leading to hypothesis 6. EARN VOL

is calculated based on the standard deviation of operating profit over the past five years:

Hypothesis 6 Earning volatility is inversely correlated with readability. That is, companies

performing unstable businesses have lower readability of MD&A than companies doing stable

business.

NITEMS is a variable representing financial complexity. Firms with complex finances are expected to become more complex, leading to less readable annual reports. Therefore, hy-pothesis 7 is set. The complexity of finance was measured as follows. Among the additional disclosure items included in Nikkei NEEDS, the number of items (logarithmic transforma-tion) not voluntarily disclosed is counted. This is because firms reporting many items in financial statements are considered to be complicated in finance. In other words, companies with a large value for NITEMS are financially complicated:

Hypothesis 7 Financial complexity is inversely correlated with readability. That is,

com-panies with complex finance have lower readability of MD&A than comcom-panies with a simpler

(9)

In this paper, we analyse the usefulness of MD&A according to Li (2008). First, we analyse difficulty and length of sentences to gather information about readability. However, Li (2008) targeted English, and we are focusing on Japanese publications. Language differ-ences in text mining are major issues. Thus, Shibasaki and Tamaoka (2010) is used in this paper. We make a grade judgment to measure the difficulty of sentences. Therefore, for hypotheses 1 to 7 on the readability of MD&A, the dependent variable is set to difficulty or length, where a high value indicates low readability. Verification is carried out by the following equation:

Grade or Length=β0+β1SIZE+β2M T B+β3AGE+β4SI P +β5SI N

+β₆N BSEG+β₇N GSEG+β₈N F SEG+β₉EARN V OL+β₁₀N IT EM S,

(3)

For verification of the hypothesis, the expected sign of each coefficient is as shown in Table 1.

4 Sampling

(10)

First Section of the Tokyo Stock Exchange (TSE), unlisted companies, and companies listed on the second section of the TSE and other stock exchanges.

Financial data for the past five years was obtained from Nikkei NEEDS-Financial Quest 2.0 compiled by Nikkei, Inc, and used to calculate EARN VOL. Furthermore, in order to ensure comparability, we restricted the sample to the fiscal year ending March 31st to match the adoption of the Japanese Generally Accepted Accounting Principles (GAAP). The sam-ple size of the companies that were listed on the First Section of the TSE that meets these conditions was 1,106. Ohter companies (1,159) excluded from the analysis.

5 Results

Table 2 shows the descriptive statistics. The mean Grade: an estimate grade for Japanese difficulty, is 10.81. Li (2008) reported in his US sample that the meanFOG Index was 18.23, which indicated unreadable. Although there are issues with comparing the two results due to linguistic differences, we are able to say that Japanese annual reports are easier to read on average than US annual reports. In our paper, the length of the MD&A is measured in terms of characters, not words like Li (2008). However, Li (2008) reported that the mean number of words in MD&A section was 4,665. Compared with our finding that the mean number of characters in MD&A is 2,023, it is obvious that the Japanese MD&As are comparatively brief. In summary, Japanese MD&As have greater readability than those from the US.

Table 3 shows the results of a regression analysis on equation (3). Column [1] reports the determinants of the MD&A Grade, column [2] and [3] report the determinants of the two factors,PredicateRate and HiraganaRate, respectively, that make up the MD&A Grade.

SIZE and AGE are significantly related to MD&A Grade by affected the HiraganaRate. As hypothesised, SIZE raises the difficulty of sentences. However, contrary to our expectation,

(11)

is plausible that the MD&As of older firms are easy for shareholders to understand even using difficult words written in Kanji. Column [4] reports the determinants of the MD&A

Length,SIZE, AGE and NFSEG. As hypothesized, SIZE increases the length of the MD&A and AGE decreases it. Regarding the outcome of AGE, this is consistent with previous findings because, if the information content is the same, using Kanji shortens the length of the sentence. As forNFSEG, the result differed from the hypothesis; it seems that companies are conscious of shareholders whose are not good at Japanese.

6 Conclusion

(12)

ex-ternal stakeholders such as investors. On the other hand, since it can also be interpreted that the disclosure of MD&A is becoming less important, further analysis is necessary.

References

[1] Bryan, S. H. (1997). Incremental information content of required disclosures contained in management discussion and analysis, The Accounting Review, Vol. 72, No. 2, pp. 285-301.

[2] Cole, C. J. and Jones, C. L. (2004). The Usefulness of M&A Disclosures in the Retail Industry, Journal of Accounting, Auditing & Finance, Vol. 19, No. 4, pp. 361-388. [3] Lang, M. and Stice-Lawrence, L. (2015). Textual analysis and international financial

reporting: Large sample evidence, Journal of Accounting and Economics, Vol.60, No.2-3, pp.110-135.

[4] Lawrence, A. (2013). Individual investors and financial disclosure,Journal of Accounting and Economics, Vol.56, No.1, pp.130-147.

[5] Lee, Y. J. (2012). The Effect of Quarterly Report Readability on Information Efficiency of Stock Prices, Contemporary Accounting Research, Vol.29, No.4, pp.1137-1170. [6] Li, F. (2008). Annual report readability, current earnings, and earnings persistence,

Journal of Accounting and Economics, Vol.45, No.2, pp.221-247.

[7] Shibasaki, H. and Tamaoka, K. (2010). Constructing a Formula to Predict School Grades 1-9 based on Japanese Language School Textbooks,Japan Journal of Educational Tech-nology, 33(4), pp. 449-458.

[8] Sun, Y. (2010). Do MD&A disclosures help users interpret disproportionate inventory increases?, The Accounting Review, Vol. 85, No. 4, pp. 1411-1440.

(13)

Tables

Table 1: Expected coefficient signs for each hypothesis Hypothesis 1 2 3 4 5 6 7 Coefficient β₁ β₂ β₃ β₄/β₅ β₆/β₇/β₈ β₉ β₁₀

(14)

Table 2: Descriptive Statistics

Mean Std. Dev. Min 25th Median 75th Max

Grade 10.81 0.56 9.54 10.44 10.75 11.16 12.32

PredicateRate (%) 2.40 0.60 0.90 2.00 2.39 2.77 4.00

HiraganaRate (%) 31.82 4.11 20.79 29.11 32.02 34.75 41.05

Length 2023 1131.9 411.8 1221.5 1761.5 2525.8 6183.8

SIZE 24.88 1.48 21.48 23.76 24.68 25.82 29.47

MTB 696198 876928 51447 329385 497780 784591 13963570

AGE 39.71 20.46 1.00 19.00 44.00 58.80 65.00

SI P 0.01 0.02 0.00 0.00 0.00 0.01 0.30

SI L 0.01 0.02 0.00 0.00 0.00 0.01 0.21

NBSEG 1.55 0.81 0.00 1.66 1.95 2.08 2.71

NGSEG 0.28 0.71 0.00 0.00 0.00 0.00 2.40

NFSEG 1.02 0.98 0.00 0.00 1.61 1.95 2.64

EARN VOL 2216.9 7113.9 11.4 227 565.1 1476.9 124248.3

NITEMS 3.10 0.13 2.57 3.00 3.09 3.18 3.53

Earnings 0.06 0.04 -0.12 0.03 0.05 0.07 0.50

PL 0.97 0.18 0.00 1.00 1.00 1.00 1.00

(15)

Table 3: Summary Statistics of the Determinants of Grade and Length

Adjusted R2 0.052 0.023 0.037 0.058 Residual Std. Error (d.f. = 1,065) 0.546 0.593 4.034 0.534 F Statistic (d.f. = 40; 1,065) 2.510*** 1.652*** 2.054*** 2.706***