According to Wikipedia, Market Capitalization “is the total value of the issued shares of a publicly traded company; it is equal to the share price times the number of shares outstanding.” For a data analysis project, I needed to determine the market cap of stocks at particular past dates. I already had the share prices for these dates, so I only needed to find the number of shares outstanding on these dates. I wrote code to scrape the web and estimate past shares outstanding.

My code estimates the number of shares outstanding on any historical date by considering the current number of shares outstanding and the splits that have occurred since that date. This procedure is definitely not perfect! Splits have a big impact on the number of shares outstanding, but other factors can affect the number as well. Hopefully the fluctuations caused by the other factors are relatively small. Regardless, the other factors are harder to get historical data for.

The current number of shares outstanding can be retrieved from Yahoo! Finance. Unfortunately, this database is incomplete; in particular, many small stocks are missing. Split histories can be found at GetSplitHistory; it was also missing many stocks.

My code uses the XML package and XPath. Also, because web scraping can take a long time and is particularly vulnerable to failure, I created an append.csv function. It behaves much like write.csv, but it builds the csv file one line at a time, in order to save your progress as you go.

Finally, for any past date, multiply the price by number of shares outstanding to estimate a stock’s market capitalization on that date.

05 January 2013