10-K Corpus (extended version) --------------------------------- http://clip.csie.org/10K/10KCorpus Version 2.0 released April 16, 2016. The corpus contains 10-K reports from US companies during years 1996 to 2013, which also includes postevent volatilities, volatilities of stock returns for the twelve-month periods preceding and following each report, abnormal trading volumes, and excess returns for each report. The data are organized by the year of the report. Note that we follow the naming rule of the 10-K Corpus provided by Kogan et al. (http://www.cs.cmu.edu/~ark/10K/). For year yyyy, there are several files: all.full.tgz: yyyy.full.tgz - the original 10-K reports (named key.txt) all.mda.tgz: yyyy.mda.tgz - the MD&A sections from the 10-Ks (named key.mda) all.tok.tgz: yyyy.tok.tgz - the tokenized MD&A sections (named key.mda) (The files in the above tarballs are similarly named; the string up to the . is a unique key for the report.) all.logfama.tgz: yyyy.logfama.txt - maps key to postevent return volatility in following year all.logvol.tgz: yyyy.logvol.-12.txt - maps key to log volatility in preceding year yyyy.logvol.+12.txt - maps key to log volatility in following year all.abnormal.tgz: yyyy.abnormal.txt - maps key to abnormal trading volume in following year all.excess.tgz: yyyy.excess.txt - maps key to excess return in following year all.meta.tgz: yyyy.meta.txt - maps key to a date (yyyymmdd format), URL, company name, and SEC code)