As financial professionals including policy-makers tend to base decisions on research performed using large machine-readable financial databases, the accuracy of the financial data provided by database companies has a direct impact on the quality of their decisions.
The objective of this study was to examine data errors in the DataGuide and KisValue databases which are both primary sources of stock prices and return data for Korea Exchange securities in Korea. This article also discussed the methodological implications of erroneous data on monthly stock returns in empirical studies on Korean financial markets.
A cross-checking technique was used in this study.
The results suggest that there are material discrepancies between the DataGuide and KisValue databases in monthly stock returns, most of which are attributable to the mishandling of split events and of missing values. The results also indicate that DataGuide provides a more reliable service than KisValue in terms of monthly stock returns.
The results show that extreme monthly returns resulting from serious data errors in the DataGuide and KisValue databases may be enough to sharply change the properties of monthly stock return distributions and to over- or underestimate long-term abnormal stock returns.
This article examines data errors in the DataGuide and KisValue databases, which are both primary sources of the stock prices and return data for Korea Exchange (KRX) securities in Korea, by using a cross-checking technique. We also discuss the methodological implications of erroneous data for monthly stock returns in empirical studies on Korean financial markets. In order to do that, we match and compare monthly stock returns for shares listed on the KRX Securities Market that are available in both the DataGuide and KisValue databases and analyse the accuracy of the data as well as the source of errors in each database, covering 15 years from January 2000 to December 2014.
As financial professionals, including policy-makers, tend to base decisions on research performed by using large machine-readable financial databases, the accuracy of the financial data provided by database companies has a direct impact on the quality of their decisions. That is, if the financial data employed by decision-makers are accurate, they can make high-quality decisions that are appropriate for research (Winkler, Kuklinski & Moser
In fact, previous studies have suggested that data errors can be a serious problem in computerised financial databases such as Center for Research in Security Prices (CRSP), Compustat and Value Line (Bennin
Since the early 1970s, when Rosenberg and Houglet first reported a few large errors in CRSP and Compustat data on monthly price relatives, researchers have also investigated potential data problems in accounting databases such as Compustat, Value Line and EDGAR Online (Chychyla & Kogan
Until recently, many studies in the United States have documented that (1) there exist erroneous data in well known commercial financial databases such as CRSP, Compustat and Value Line; and (2) a few serious errors in these databases could adversely affect research and decision-making. In Korea, however, it is very hard to find studies that examine data errors in financial information databases, except for those by Oh and Lee (
In comparison to United States studies, few studies in Korea have examined the quality of accounting data in popular financial databases. The stock returns data for KRX securities, provided by data aggregators, have never been verified by using a cross-checking technique, even though they are frequently used in empirical studies in both corporate finance and investment (Jung
The remainder of this article is organised as follows. The ‘Data and methodology’ section discusses the data and methodology. The ‘Analysis and results’ section analyses the discrepancies between the DataGuide and KisValue databases in the data on monthly stock returns. The ‘Methodological implications of data errors’ section presents and discusses the primary results, and the ‘Summary and conclusions’ section concludes this article.
It is well known that DataGuide (from FnGuide, Inc.) and KisValue (from NICE Information Service) are the primary sources of both stock return data and historical accounting data in Korea. In order to examine the quality of stock return data in these two popular databases, we matched and compared DataGuide and KisValue data on monthly returns for 729 KRX listed securities for a period of 15 years - from January 2000 to December 2014. In the study by Rosenberg and Houglet (
To be included in the sample, KRX securities had to meet the following criteria:
The data on the monthly returns of common shares should be available in both DataGuide and KisValue databases.
The data on daily and monthly stock prices should be available at the KRX website (
The annual reports and major disclosure information of the companies should be available at DART (dart.fss.or.kr), the electronic disclosure system of the Financial Supervisory Service or KIND (kind.krx.co.kr), the electronic disclosure system of KRX.
Applying these criteria, we compared a total of 109 260 firm-month data on monthly stock returns between the DataGuide and KisValue databases.
For this study, we matched and compared the monthly returns data for KRX listed securities between the DataGuide and KisValue databases by using a cross-checking technique. Rosenberg and Houglet (
According to Rosenberg and Houglet (
A monthly stock return is the change in the total value of an investment in a stock after a month per dollar of initial investment. In this study, monthly stock-return means a monthly stock-return without dividends. A monthly stock-return is therefore calculated, as shown in
where
In
To examine the quality of stock returns data in commercial financial databases in Korea, we matched and compared a total of 109 260 firm-month data on monthly stock returns between the DataGuide and KisValue databases. The data on monthly stock returns were downloaded from the websites of the two databases on the Internet (
The comparison of monthly stock returns between the DataGuide and KisValue databases.
Monthly stock returns matched | Number |
Percentage |
---|---|---|
109 260 | 100.00 | |
More than 20% | 24 | 0.02 |
More than 5%, but less than 20% | 58 | 0.05 |
More than 1%, but less than 5% | 381 | 0.35 |
Less than 1% | 2100 | 1.93 |
This table reports the numbers and percentages of discrepancies between monthly stock returns data in the DataGuide and KisValue databases by the level of discrepancy. We compared data on monthly returns for 729 KRX listed securities for a period of 15 years - from January 2000 to December 2014 - between the DataGuide and KisValue databases. In total, we compared 109,260 monthly stock returns between the two databases. The data on monthly stock returns were downloaded from the websites of these two databases on the Internet (
As shown in
Even though the discrepancies of monthly stock returns data between the DataGuide and KisValue databases could stem from a variety of sources, we categorised them into the following four types of errors: (1) mishandling of split events; (2) mishandling of missing returns; (3) misspecification of month-end dates and (4) unexplainable errors.
If you own a stock that undergoes a split event, such as stock split, stock dividend, capital reduction, right offerings and spin-off, you should use the split-adjusted price when calculating stock-return. In other words, the last sale price,
As an example of the consequence of the mishandling of split events, we can present the case of calculating monthly returns on the common shares of LG Chemicals in April 2009. On 19 December 2008, LG Chemicals announced that it would spin-off the industrial material business, now called LG Houses, on 01 April 2009. The old shareholders of LG Chemicals received 0.12 common shares of the newly established LG Houses as well as 0.88 common shares of existing LG Chemicals in exchange for one old common share of LG Chemicals. Because of the spin-off procedure, LG Chemicals common shares were suspended after they were traded at the closing price of 90 000 KRW on 27 March 2009 until the suspension was lifted on 20 April 2009. When the new common shares of LG Chemicals and LG Houses were relisted on the KOSPI Market of KRX on 20 April 2009, trading resumed at the beginning price of 128 000 KRW for LG Chemicals common shares. On the last trading day of April 2009, the share prices of LG Chemicals and LG Houses closed at 141 500 KRW and 115 000 KRW, respectively. In this example, the adjustment factor,
Applying the adjustment factor estimated by
However, DataGuide made a fatal error in calculating the monthly return on LG Chemicals common stock in April 2009, because it failed to take into account the effect of the spin-off event on the month-end price,
In addition, because DataGuide used the beginning price of LG Chemicals shares on 20 April 2009 as the last sale price at
Meanwhile, the error that KisValue made in calculating the monthly return on LG Chemicals common stock was quite similar to that of DataGuide in that it also failed to use the month-end price,
The only difference between the two databases is that KisValue employed the last sale price (90 000 KRW) on 27 March 2009 when KRX suspended the trading of LG Chemicals shares as a purchase price at
Another example of the consequence of mishandling split events is the case of Schnell Biopharmaceuticals in May 2009. Schnell Biopharmaceuticals conducted capital reduction without refund by consolidating 10 shares of common stock into one share on 27 May 2009. However, both DataGuide and KisValue databases failed to reflect the effect of the 1-for-10 reverse stock split on the month-end price,
If you have no valid last sale price at either month
An unexpected critical error resulting from the mishandling of missing returns is illustrated by the monthly stock-return on the common stock of Chinhung International, Inc. in March 2012. For nearly 2 months, from 24 February 2012 to 16 April 2012, KRX suspended trading in the common shares of Chinhung International Inc. because of a 1-for-10 reverse stock split for a shareholders’ equity reduction implemented on 16 March 2012 and resumed trading on 17 April 2012. Because of the trading suspension, there was no trading in the common shares of Chinhung International Inc. on KRX for the whole month of March 2012. Therefore, the monthly return on the common shares of Chinhung International Inc. in March 2012 should be missing. As a result of an error in handling the missing-return, however, a large discrepancy, as high as 900%, between DataGuide and KisValue databases was generated. Firstly, DataGuide calculated a 0% return in March 2012, considering the effect of the reverse stock split implemented on 16 March 2012, as shown in
In
This example of the monthly return on the common shares of Chinhung International Inc., suggests that the mishandling of missing returns could cause material errors in calculating monthly stock returns in the DataGuide and KisValue databases.
KisValue errors in specifying the last trading day for January 2012 caused 394 erroneous monthly stock returns, as shown in
The distribution of discrepancies greater than 1% between DataGuide and KisValue by the source and level of the discrepancy.
Level of discrepancy | Source of discrepancy |
Total | |||
---|---|---|---|---|---|
Mishandling of split events | Mishandling of missing returns | Misspecification of month-end dates | Unexplainable errors | ||
More than 20% | 11 (2.38%) | 7 (1.51%) | 4 (0.86%) | 2 (0.43%) | 24 (5.19%) |
≥ 5% but < 20% | 5 (1.08%) | - | 49 (10.58%) | 4 (0.86%) | 58 (12.52%) |
≥ 1% but < 5% | 30 (6.48%) | - | 341 (73.65%) | 10 (2.16%) | 381 (82.29%) |
This table presents the distribution of 463 discrepancies greater than 1% between the DataGuide and KisValue databases by the source and level of discrepancy. The sources of discrepancies are categorised into the following four types of errors: (1) mishandling of split events; (2) mishandling of missing returns; (3) misspecification of month-end dates; and (4) unexplainable errors. The numbers in parentheses represent the percentages.
Consequently, KisValue underestimated the monthly return on Youngbo Chemical common stock in the month of January 2012 by more than 20% in comparison with the correct monthly return in
There were 16 discrepancies greater than 1% between the DataGuide and KisValue databases of which the source was obscure, as shown in
The distribution of material discrepancies greater than 20% between DataGuide and KisValue by the source of discrepancy.
In
The distribution of data errors resulting in discrepancies greater than 1% between monthly stock returns data in the DataGuide and KisValue by databases containing errors.
Level of discrepancy | Source of errors | Database containing errors |
Total | ||
---|---|---|---|---|---|
DataGuide | KisValue | DataGuide and KisValue | |||
More than 20% | Mishandling of split events | - | 1 (0.22%) | 10 (2.16%) | 11 (2.38%) |
Mishandling of missing returns | - | - | 7 (1.51%) | 7 (1.51%) | |
Misspecification of month-end dates | - | 4 (0.86%) | - | 4 (0.86%) | |
Unexplainable errors | 1 (0.22%) | - | 1 (0.22%) | 2 (0.43%) | |
Sub-total | 1 (0.22%) | 5 (1.08%) | 18 (3.89%) | 24 (5.18%) | |
More than 5%, less than 20% | Mishandling of split events | - | 2 (0.43%) | 3 (0.65%) | 5 (1.08%) |
Mishandling of missing returns | - | - | - | - | |
Misspecification of month-end dates | - | 49 (10.58%) | - | 49 (10.58%) | |
Unexplainable errors | 4 (0.86%) | - | - | 4 (0.86%) | |
Sub-total | 4 (0.86%) | 51 (11.01%) | 3 (0.65%) | 58 (12.52%) | |
More than 1%, less than 5% | Mishandling of split events | - | 30 (6.48%) | - | 30 (6.48%) |
Mishandling of missing returns | - | - | - | - | |
Misspecification of month-end dates | - | 341 (73.65%) | - | 341 (73.65%) | |
Unexplainable errors | 8 (1.72%) | 1 (0.22%) | 1 (0.22%) | 10 (2.16%) | |
Sub-total | 8 (1.72%) | 372 (80.35%) | 1 (0.22%) | 381 (82.29%) | |
This table presents the distribution of data errors resulting in discrepancies greater than 1% between monthly stock returns data in DataGuide and KisValue by databases containing errors. In the table, ‘DataGuide and KisValue’ indicates that both DataGuide and KisValue databases made errors in calculating monthly stock returns. The sources of discrepancies are categorised into the following four types of errors: (1) mishandling of split events; (2) mishandling of missing returns; (3) misspecification of month-end dates; and (4) unexplainable errors. The numbers in parentheses represent the percentages.
Previous studies have already demonstrated that a few large errors can have a significant impact on the distributional properties of select financial variables, including monthly stock returns (Beedles & Simkowitz
The effect of large errors on distributional properties of monthly stock returns.
Distributional properties | DataGuide (A) | KisValue (B) | B/A |
---|---|---|---|
Sample size | 109 260 | 109 260 | - |
Minimum | −0.8584 | −0.9044 | 1.05 |
Maximum | 7.7702 | 71.0000 | 9.14 |
Mean | 0.0154 | 0.0173 | 1.12 |
Variance | 0.0312 | 0.1069 | 3.43 |
Skewness | 5.4615 | 126.0659 | 23.08 |
Kurtosis | 123.1019 | 25550.4000 | 207.55 |
This table presents the distributional properties (mean, variance, skewness and kurtosis) of monthly stock returns for two databases, DataGuide and KisValue, with different error rates. In the table, ‘B/A’ means the ratio of distributional properties in KisValue (B) to those in the DataGuide (A) database.
As the frequency of large errors in the handling of split events and missing returns is higher for the KisValue database than for the DataGuide database, as shown in
Another consequence of a few erroneous extreme returns is the over- or underestimation of the long-term stock performance of the individual securities.
The effect of large errors on long-term abnormal stock returns.
Company | Date of error | Long-term stock performance for erroneous data |
Source containing error | Long-term stock performance for corrected data |
||
---|---|---|---|---|---|---|
CAR | BHAR | CAR | BHAR | |||
CJ Korea Express | 5/29/2009 | −0.1513 | −0.3546 | DG and KV | −0.0545 | −0.2734 |
Yuyu Pharma, Inc. | 8/31/2000 | 1.7528 | 0.5037 | DG | 2.0489 | 1.1140 |
Namkwang Eng. and Const. | 3/30/2012 | 9.2719 | −0.9797 | DG and KV | 0.2983 | −1.1284 |
Pumyang Construction | 12/28/2012 | 69.4275 | 5.9651 | DG and KV | −1.6690 | −0.9958 |
Pumyang Construction | 1/31/2014 | −1.7378 | −1.0471 | DG and KV | −1.8401 | −1.0179 |
Chinhung International | 3/30/2012 | −1.1531 | −0.9433 | DG and KV | −1.1485 | −0.9427 |
Schnell Biopharmaceuticals | 5/31/2009 | 21.7136 | 10.2138 | DG and KV | 0.1234 | −0.6037 |
S&T Dynamics | 3/31/2003 | 0.3899 | −0.1936 | KV | −0.0961 | −0.6482 |
Tway Holdings | 4/30/2009 | 2.7088 | −0.0115 | DG and KV | −1.9669 | −1.0063 |
Chokwang Paint | 1/31/2012 | 2.0122 | 1.8872 | KV | 2.2404 | 2.1709 |
Kukdong Corporation | 1/31/2012 | 3.2059 | 6.8763 | KV | 3.4733 | 8.0661 |
Hansol Artone Paper | 7/31/2009 | −0.0747 | −0.4550 | DG and KV | −1.0392 | −0.7676 |
Hangchang Paper | 6/30/2009 | 5.1804 | 3.4290 | DG and KV | −0.0743 | −0.3950 |
Youngone Holdings | 7/31/2009 | 1.9829 | 4.2510 | DG and KV | 1.3390 | 2.1937 |
Hyundai Paint | 4/30/2014 | 24.8417 | 13.9301 | DG and KV | −0.1343 | −0.5100 |
STX Corporation | 3/31/2014 | 2.6591 | −0.7951 | DG and KV | −1.3575 | −1.0844 |
Daeyoung Packaging | 12/31/2002 | −0.4248 | −0.9571 | DG and KV | 0.3187 | −0.4916 |
Daeyoung Packaging | 2/28/2003 | 0.0487 | −0.7993 | DG and KV | 0.6452 | −0.1241 |
Youngbo Chemical | 1/31/2012 | 0.1339 | −0.0911 | KV | 0.3541 | 0.0684 |
Iljin Display | 8/31/2009 | 1.8067 | 2.5033 | DG and KV | 2.0125 | 3.7961 |
Maniker | 10/31/2002 | −0.2099 | −1.0202 | DG and KV | 0.6074 | 0.1342 |
LG Chemical | 4/30/2009 | 1.6142 | 2.8937 | DG and KV | 1.0840 | 1.5114 |
Artis | 4/30/2012 | 3.8684 | 1.2870 | DG and KV | −0.0758 | −0.6103 |
Wooridul Huebrain | 1/31/2012 | 1.8080 | −0.5707 | KV | 2.0546 | −0.4897 |
- | - | 6.6948 | 1.8968 | Mean | 0.2977 | 0.3319 |
- | - | 1.8074 | −0.0513 | Median | 0.2109 | −0.4907 |
This table presents the effect of large errors on the long-term abnormal stock returns for a sample of 24 firm-months with discrepancies greater than 20% between the DataGuide and KisValue databases. The table contrasts the long-term abnormal stock returns estimated by using erroneous monthly returns data with those estimated by using corrected monthly returns data resulting from righting 24 large errors detected in the DataGuide and/or KisValue databases. In order to measure long-term stock performance, we calculate a 36-month cumulative abnormal return (CAR) and buy-and-hold abnormal return (BHAR) using the KOSPI equally weighted market index as a return benchmark.
In order to measure the long-term stock performance in
where
As shown in
However, using these statistical methods is not always a good practice, because outliers are not necessarily erroneous. For example, the maximum value of monthly stock returns in the DataGuide database, 7.7702, is not an erroneous, but a valid return, even though it is definitely an extreme value in comparison with the mean return, 0.0154, as shown in
Therefore, to ensure the reliability of empirical research on capital markets in Korea, it is necessary to minimise large errors in popular financial databases such as DataGuide and KisValue that lead to extreme values or outliers. Of course, legitimate outliers that do not result from errors should be used properly as needed. If users really want high-quality databases to protect their decisions based on financial databases from being distorted by data errors, they should use cross-checking to screen for data errors in alternative financial databases on a regular basis. As a good example of how cross-checking could be used for quality control in popular databases, Bennin (
Meanwhile, companies that provide databases, such as DataGuide and KisValue in Korea, should implement their own verification systems whereby data managers screen for errors in databases by using cross-checking periodically and correct them immediately in order to maintain high-quality databases. Additionally, as most of the large errors that generate extreme values in the DataGuide and KisValue databases result from the mishandling of split events and of missing values in calculating monthly stock returns, it is most important for the database companies to educate data managers to fully understand corporate events that affect stock returns (i.e. stock splits, stock dividends, capital reductions and spin-offs) and the statistical concept of missing values.
We examined data errors in the DataGuide and KisValue databases, which are commonly used by financial professionals in Korea, by using cross-checking. We focused mainly on comparing monthly stock returns for 729 KRX listed securities available in both the DataGuide and KisValue databases covering 15 years from January 2000 to December 2014.
We find strong evidence that there exist material discrepancies in monthly stock returns between the DataGuide and KisValue databases, most of which are attributable to errors in handling stock split events and missing returns. Specifically, out of 109 260 comparisons, we find 2563 (2.35%) to be erroneous, including 58 (0.05%) that differ by more than 5% and 24 (0.02%) that differ by more than 20%. In addition, the mishandling of split events (i.e. stock splits, reverse stock splits, rights offering and spin-offs) causes serious data errors in proportion to the split ratios which range from 2 to 72. The results also show a DataGuide error rate of 0.03% and a KisValue error rate of 0.41%, indicating that DataGuide is a more reliable database than KisValue for monthly stock returns. Further, the results also show that all the extreme returns, which from large errors in the two databases, can be significant enough to sharply change the properties of monthly stock return distributions and to over- or underestimate long-run abnormal stock returns. In particular, the mean 36-month CARs estimated, which used erroneous monthly returns data, are 20 times larger than those estimated using corrected returns data for the 24 firm-month sample in which DataGuide and/or KisValue made serious errors.
Finance researchers in Korea already know that there are data errors in popular financial databases such as DataGuide and/or KisValue. They assume that outliers might be erroneous and could have a significant effect on empirical analysis. Therefore, in order to minimise the effect of outliers, they discard them or transform them using the winsorising method. However, using these statistical methods is not always a good practice, because all outliers are not necessarily erroneous. In this regard, the users of financial databases must regularly examine data errors even in highly reputed databases and ask database companies to correct the errors in order to ensure reliable financial databases in Korea. On the contrary, database companies should develop a more sophisticated algorithm to take into account the effect of split events in calculating monthly stock returns. Further, they also need to introduce a series of special missing-return codes, specifying the reason for missing returns as in the CRSP database. Finally, not only users, but also database companies have to keep in mind that ‘the presence of erroneous data can destroy a research effort and seriously damage the management decisions based upon research’, as stated by Rosenberg and Houglet (
The authors would like to thank an anonymous referee for helpful comments. The authors also would like to thank the Pukyong National University for funding this study.
This work was supported by the Pukyong National University Research Abroad Fund in 2015 (C-D-2015-0511).
The authors declare that they have no financial or personal relationships which may have inappropriately influenced them in writing this article.
H.-C.J. and H.-J.N. conceived and designed the research. H.-J.N. undertook the primary research and H.-C.J. was in charge of analysing the data and contributed to organising all sections and critical revision. Both authors read and approved the final manuscript.
For another example, in the case of a valid current price,
Jung (