Relative Valuation with Machine Learning

Having attracted these promising features, we use machine learning tree models to perform relative evaluation. Our machine learning models are not designed to detect market-level mispricing in time series4. We find that machine learning models significantly outperform traditional models in valuation accuracy.

Overvalued IPOs based on machine learning model valuations often underperform the stock market (assessed with an event-based portfolio approach) and generate negative abnormal benchmark alphas over the long term (assessed with a calendar-time portfolio approach). We compare our machine learning models to a range of traditional models used in the relative valuation literature. At the end of each month, machine learning models use the accounting information available at that time to predict out-of-sample valuation multiples based on the valuation multiples of peers in the test dataset.

3 Model performance

Model performance benchmarking

If the target variable in the machine learning model is the natural logarithm of the market-to-book ratio (lnm2b), the equity value is estimated as. If the target variable in the machine learning model is the natural logarithm of the enterprise-value-to-assets ratio (lnv2a), the equity value is estimated as. If the target variable in the machine learning model is the natural logarithm of the enterprise-value-to-sales ratio (lnv2s), the value of equity is estimated as.

Machine learning models generate a med(|ε|) ranging between 28.0 and 31.0 percentage points, all of which are lower than those produced by any of the traditional models. Depending on which type of multiple we use, machine learning models reduce the median absolute valuation error by between 5.6 and 31.4 percentage points relative to the best performance of the traditional models. The machine learning models used in this study (GBMs) can accommodate missing values in the input data.

Model performance over time and across firms

Panels D to F in Figure 4 report the means of valuation errors (a measure of bias) from machine learning models in quintile portfolios sorted by firm characteristics. Machine learning models tend to overestimate small stocks but underestimate large stocks, as shown by the downward-sloping lines in panel D of Figure 4 that start at about 30 and end at about. The machine learning models also systematically overvalue stocks with low book-to-market ratios (with means of absolute valuation errors around 50 in Panel E of Figure 4) and undervalue those with high book-to-market ratios.

In contrast, the machine learning model bias does not appear to vary substantially across ROE quintiles. Machine learning models also outperform all traditional models when we include all firm-year observations with non-missing and non-negative valuation multiples (see Table IA.10 in the Internet Appendices). Using this example moderately increases the rating errors in all machine learning models by 0.5 to 2.9 percentage points.

4 Important variables in determining valuation mul- tiples

Theoretical determinants of fundamental valuation multiples

Firms in the energy, healthcare, and telecommunications industries are more difficult to value than firms in other industries, as evidenced by the peaks in median absolute valuation error for these three industries (Panel A in Figure 5). In the upper right corner of each plot, the dots (plotted for the predicted values versus the actual values) for larger companies form a narrow line. These dots are spread out more for smaller firms in the lower left corner of each plot.

Variables related to profitability (ROE, ROA and P M) and growth (ge andg) are expected to have a positive marginal effect on all valuation multiples. In addition, the marginal effect of asset turnover (AT R, the assets-to-sales ratio) is expected to be positive for the firm-to-assets multiple (v2a) but negative for the firm-to-sales multiple (v2s) . The marginal effect of variables in the categories of financial health, solvency and liquidity do not have a clear predicted sign, as these variables are related to financial leverage.

Important variables identified by GBM

Because SHAP values are additive, the SHAP value of a category of variables is the sum of all SHAP values of the variables in that category. For example, the average of the rankings of variables in the profitability category is always positively correlated with the ranking of a target variable. In contrast to the important variable categories, the SHAP values of some categories in Table 5 are always lower than 5.

Industry classification makes a moderately important contribution to predicting valuation multiples, with SHAP values ranging between 5.3 and 8.8 percentage points. The overall importance of variables in the size category also increases in the sample with small firms. SHAP values for variables in the size category range between 5.1 and 7.9 percentage points in the robustness test in Table (IA.12) in the Internet Appendices, higher than the range between 2.1 and 2.7 percentage points in Table 5.

5 Valuation error and future stock returns

In the main tests (Table 4), variables in the size category are not among the top 10 most important variables in any machine learning model. The SHAP values of the variables in the size category range between 5.1 and 7.9 percentage points in the robustness test in Table (IA.12) in the Internet Supplement, which is higher than the range between 2.1 and 2.7 percentage points in Table 5 .5 minus quintile 1) . Only the strategy using RRVcs model errors is robust to both FF6 risk factors and anomalies in the 13-factor models (RRVcs column in Table 7 ).

The predictability of the return from machine estimation error is robust to the inclusion of small firms (Tables IA.13 and IA.14 in the Internet Appendices). First, we use accounting information in the Compustat quarterly database because we do not have access to the point-in-time data used by Bartram and Grinblatt (2018). Third, we do not winorize the input variables because GBM is robust to outliers in the input variables, but Bartram and Grinblatt (2018) winsorize all input variables.

6 IPO valuation accuracy and long-run performance

IPO valuation methodology and data

IPO valuation accuracy benchmarking

IPO valuation errors and long-run performance

The alphas from buying overvalued IPOs with the value-weighted strategy (P1 in Panel B of Table 10) are always negative and highly significant with p-values between 0.1% and 5.3% for all machine learning models (MLlnm2b, MLlnv2a and MLlnv2s) and all comparison model (FF3, FF5 and FF6). Abnormal returns are between -0.54 and -0.88 per month due to investing in overpriced IPOs for a period of 36 months. Value-weighted hedge portfolios constructed from valuation errors from seven of the eight traditional models produce highly significant FF3 alphas (Panel B in Table IA.18) ranging between 0.66 and 0.92 percentage points per month.

However, the abnormal returns on portfolios sorted by machine valuation errors are more robust to additional risk factors (particularly the investment and profitability factors) in FF5. Hedge portfolio alphas sorted by machine valuation errors remain highly significant under both FF5 and FF6. However, with the inclusion of small companies (Table IA.18 in the Internet Appendices), only two of the eleven models are able to produce significant FF5 or FF6 alphas, indicating that detecting mispricing in small IPO candidates is particularly difficult .

7 Conclusion

This figure shows the median absolute estimation errors (med(|ε|)) from machine learning models over time. M , where M is the predicted market value of equity and M is the actual value of equity. MLlnm2b, MLnv2a and MLv2 refer to machine learning models whose target values are the natural logarithm of company value per assets (lnm2b), the natural logarithm of company value per assets (lnv2a) and the natural logarithm of company value per -sales (lnv2s) respectively.

These figures depict the mean absolute estimation errors (med(|ε|)) and mean estimation errors (med(ε)) from the machine learning models across the Fama-French industries 12. While we use the Fama- French 49 in all sermon models, estimation errors are summarized in 12 industries to facilitate presentation. This table reports the sample size after each step taken to obtain the final sample to be used by the machine learning models.

MLlnm2b, MLlnv2a and MLlnv2s refer to machine learning models that use lnm2b (the natural logarithm of market-to-book multiple), lnv2a (the natural logarithm of enterprise value-to-assets multiple) and lnv2s (the natural logarithm of enterprise value) -to-sales multiple) as target variables, respectively. INDm2bhm, INDv2ahm and INDv2shm refer to the industry harmonic mean approach in Liu et al. This table presents the 10 most important variables as measured by the magnitude of SHAP values for each machine learning model.

The expected sign is the sign of the theoretical marginal impact of a category of variables on the target variable. Mc is the forecast share value and M is the offering price times the number of shares outstanding in CRSP at the end of the first month of trading. The initial return is the market-adjusted return of buying a stock at its IPO price and selling it at its closing price on the first day of trading.

IPOs are grouped into three portfolios based on valuation errors from machine valuation models (MLlnm2b, MLlnv2a and MLlnv2s). High minus low refers to the difference in cumulative buy-and-hold returns between the undervalued IPOs and overvalued IPOs. IPOs are sorted into three portfolios based on valuation errors from machine learning models (MLlnm2b, MLlnv2a and MLlnv2s).

Figure 2: Train, development and test data split and rotation

Appendices

A Formula derivations

Derivation of equation (13)

Derivation of equation (14)

B Variables in machine learning models

C Summary statistics of variables used in machine learning models

D Details of variable importance in machine learning models

E Valuation error benchmarking (equal number of out- put across models)

Internet Appendices

This table describes the construction of variables used in the traditional valuation model based on Rhodes-Kropf et al. This table reports summary statistics of variables used in the traditional valuation model based on Rhodes-Kropf et al. This table reports means of coefficient estimates from cross-sectional regressions by month and FF49 industry based on Rhodes-Kropf et al.

This table describes the construction of the variables used in the traditional valuation model based on Bartram and Grinblatt (2018). This table reports summary statistics of the variables used in the traditional valuation model based on Bartram and Grinblatt (2018). This table shows averages of coefficient estimates from monthly cross-sectional regressions based on Bartram and Grinblatt (2018).

This table describes the construction of variables used in the traditional valuation model, based on Bhojraj and Lee (2002). This table reports the summary statistics of variables used in the traditional valuation model based on Bhojraj and Lee (2002). This table reports the means of coefficient estimates based on monthly cross-sectional regressions based on Bhojraj and Lee (2002).

This table reports summary statistics of monthly non-IPO firm valuation errors from various models. This table uses the same methodology as Table 2, except that this table includes "small" firm-month observations with assets, book equity, or sales between 0 and 10 million USD, while Table 2 excludes them. This table uses the same methodology as Table 4, except that this table includes small firm-month observations with assets, book equity, or sales between 0 and 10 million USD, while Table 4 excludes them.

This table uses the same methodology as Table 5, except that this table includes month-by-month observations of firms with assets, book equity, or sales between $0 and $10 million, while Table 5 excludes them. This table and Table 8 use the same methodology, except that this table includes firm-month observations with assets, book equity, and sales between $0 and $10 million, while Table 8 excludes them. This table uses the same methodology as Table 10, except that this table includes small observations for the month of firms with assets, book equity, or sales between $0 and $10 million, while Table 10 excludes them.