Stock Analysis Project

Introduction

In this project, we will explore the stock prices from January 1st, 2004 to December 31st. 2016 of the following banks:

  • Bank of America
  • CitiGroup
  • Goldman Sachs
  • JPMorgan Chase
  • Morgan Stanley
  • Wells Fargo

This project is for the practice of visualization and panda skills.Not intended for financial advice.

Import Libraries

In [1]:
import pandas_datareader.data as web
import matplotlib.pyplot as plt
from matplotlib import style
import seaborn as sns
import pandas as pd
import numpy as np
import datetime as dt
%matplotlib inline
In [2]:
# Start and End Date 
start_date = dt.datetime(2004,1,1)
end_date   = dt.datetime(2016,12,31)

Obtaining Data

We use Pandas DataReader to retrieve stock data directly from the internet.

To obtain stock information about a particular bank or company, we must have its ticker name. For example, the ticker name for Bank of America is BAC. To find the ticker names of the companies you want to analyze, you can do a google search or visit a website like eoddata.com, which contains a listing of all the ticker symbols.

In [3]:
# DATAFRAMES:
# Bank of America
bank_of_america_df    = web.DataReader('BAC', 'yahoo',start_date, end_date)
# CitiGroup
citi_group_df         = web.DataReader('C', 'yahoo',start_date, end_date)
# Goldman Sachs
goldman_sachs_df      = web.DataReader('GS', 'yahoo',start_date, end_date)
# JPMorgan Chase
jpmorgan_chase_df     = web.DataReader('JPM', 'yahoo',start_date, end_date)
# Morgan Stanley
morgan_stanley_df     = web.DataReader('MS', 'yahoo', start_date, end_date)
# Wells Fargo
wells_fargo_df        = web.DataReader('WFC', 'yahoo', start_date, end_date)

The following code demonstrates an example of the information obtained for one DataFrame.

  1. Open: Refers to the price of the stock at the start of a particular date.
  2. Close: Refers to the price of the stock at the end of a certain date.
  3. Low: Refers to the minimum price of the stock at the on the specific date.
  4. High: Refers to the Highest price of the stock at the on the specific date.
In [4]:
# rice of the stock at the begining of the date.The Open column refers to the p
bank_of_america_df.head()
Out[4]:
Open High Low Close Volume Adj Close
Date
2004-01-02 79.750000 80.430000 78.910004 79.089996 16897000 29.112093
2004-01-05 79.320000 79.599998 78.720001 79.320000 14535400 29.196755
2004-01-06 79.120003 79.389999 78.800003 79.190002 15083600 29.148904
2004-01-07 79.050003 79.169998 78.809998 79.010002 13346200 29.082648
2004-01-08 79.010002 79.139999 78.750000 79.050003 13774200 29.097372

We now create a list called “tickers_list”, which holds the tickers symbols as strings.
We also create a list called “banks_df_list”, which holds the listing of bank data frames.

In [5]:
ticker_list   = ['BAC', 'C', 'GS', 'JPM', 'MS', 'WFC']
bank_df_list  = [ bank_of_america_df, citi_group_df, goldman_sachs_df, jpmorgan_chase_df, morgan_stanley_df, wells_fargo_df]

We use the pandas concat() method to concatenate the bank data frames together to a single data frame called stocks_df. Set the keys argument equal to the tickers list.

In [6]:
# The .concat() method will concatinate or unite all the bank DataFrames 
# into a single one called stocks_df.

# Since all stack are based on same dates, it is better to 
# to concatinate over the columns axis. If we concatinate over
# rows, the dataframes will stock up one after the other in the 
# order they were specified in the list. Thus datetime will repeat.
stocks_df = pd.concat( objs= bank_df_list, axis='columns', keys= ticker_list)
stocks_df.head()
Out[6]:
BAC C MS WFC
Open High Low Close Volume Adj Close Open High Low Close Low Close Volume Adj Close Open High Low Close Volume Adj Close
Date
2004-01-02 79.750000 80.430000 78.910004 79.089996 16897000 29.112093 49.050001 49.250001 48.540000 49.000001 56.779999 56.980000 3087200 38.220141 58.500000 58.889999 57.950001 58.169998 8243200 20.136270
2004-01-05 79.320000 79.599998 78.720001 79.320000 14535400 29.196755 49.480000 49.800001 49.350001 49.790000 56.759998 57.570000 4459700 38.615892 58.450001 58.689999 57.930000 58.450001 8846000 20.233197
2004-01-06 79.120003 79.389999 78.800003 79.190002 15083600 29.148904 49.530000 49.880000 49.510001 49.840000 57.020000 57.770000 4418500 38.750045 58.150002 58.450001 58.110001 58.259998 8381600 20.167425
2004-01-07 79.050003 79.169998 78.809998 79.010002 13346200 29.082648 49.810001 50.170002 49.520002 49.910001 56.830002 57.799999 4897700 38.938673 57.770000 58.000000 57.500000 57.669998 10670600 19.963189
2004-01-08 79.010002 79.139999 78.750000 79.050003 13774200 29.097372 49.950001 50.230000 49.700001 50.120002 57.630001 58.369999 4127900 39.322670 57.680000 57.830002 57.119999 57.650002 11367000 19.956267

5 rows × 36 columns

Set the title for hierchical column names

In [7]:
# Set the title for hierchical columns names
stocks_df.columns.names = ['Ticker', 'Stock_Info']
stocks_df.head()
Out[7]:
Ticker BAC C MS WFC
Stock_Info Open High Low Close Volume Adj Close Open High Low Close Low Close Volume Adj Close Open High Low Close Volume Adj Close
Date
2004-01-02 79.750000 80.430000 78.910004 79.089996 16897000 29.112093 49.050001 49.250001 48.540000 49.000001 56.779999 56.980000 3087200 38.220141 58.500000 58.889999 57.950001 58.169998 8243200 20.136270
2004-01-05 79.320000 79.599998 78.720001 79.320000 14535400 29.196755 49.480000 49.800001 49.350001 49.790000 56.759998 57.570000 4459700 38.615892 58.450001 58.689999 57.930000 58.450001 8846000 20.233197
2004-01-06 79.120003 79.389999 78.800003 79.190002 15083600 29.148904 49.530000 49.880000 49.510001 49.840000 57.020000 57.770000 4418500 38.750045 58.150002 58.450001 58.110001 58.259998 8381600 20.167425
2004-01-07 79.050003 79.169998 78.809998 79.010002 13346200 29.082648 49.810001 50.170002 49.520002 49.910001 56.830002 57.799999 4897700 38.938673 57.770000 58.000000 57.500000 57.669998 10670600 19.963189
2004-01-08 79.010002 79.139999 78.750000 79.050003 13774200 29.097372 49.950001 50.230000 49.700001 50.120002 57.630001 58.369999 4127900 39.322670 57.680000 57.830002 57.119999 57.650002 11367000 19.956267

5 rows × 36 columns

Exploratory Data Analysis (EDA)

**What is the max close price for each bank’s stock throughout the period?

In [8]:
# We use the xs() method. The key argument is the column or row from
# which we want to do a cross selection.
stocks_df.xs(key= 'Close', axis= 1, level= 'Stock_Info').max()
Out[8]:
Ticker
BAC     89.010002
C       61.090000
GS     247.919998
JPM     87.129997
MS      89.300003
WFC     73.000000
dtype: float64

New DataFrame called “returns_df”.

In [9]:
returns_df = pd.DataFrame()
In [10]:
for ticker in ticker_list:
    returns_df[ticker+'_Return'] = stocks_df[ticker]['Close'].pct_change()
returns_df.head()
Out[10]:
BAC_Return C_Return GS_Return JPM_Return MS_Return WFC_Return
Date
2004-01-02 NaN NaN NaN NaN NaN NaN
2004-01-05 0.002908 0.016122 0.000515 -0.001912 0.010355 0.004814
2004-01-06 -0.001639 0.001004 -0.005968 0.025171 0.003474 -0.003251
2004-01-07 -0.002273 0.001405 0.021014 0.014678 0.000519 -0.010127
2004-01-08 0.000506 0.004208 0.001217 0.017096 0.009862 -0.000347

Pair plot Using Seaborn of returns data frame.

In [11]:
sns.pairplot(returns_df[1:])
Out[11]:
<seaborn.axisgrid.PairGrid at 0x211edb06748>

By taking the standard deviation, we can have a look at the volatility of risk in the stocks.

In [12]:
# A low standard deviation represents steady stocks.
returns_df.std()
Out[12]:
BAC_Return    0.033925
C_Return      0.157159
GS_Return     0.023228
JPM_Return    0.024946
MS_Return     0.034105
WFC_Return    0.028404
dtype: float64

Citi Group has the riskiest stocks.

Distribution Plot Comparison of CitiGroup for Two Different Years( 2008 and 2016)

The following is a distribution plot of CitiGroup for the year 2008 when they suffer from a stock Crash.

In [13]:
sns.distplot( returns_df.ix['2008-01-01':'2008-12-31']['C_Return'],\
             bins= 60, color= 'green')
C:\Users\Diane\Anaconda3\lib\site-packages\statsmodels\nonparametric\kdetools.py:20: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  y = X[:m/2+1] + np.r_[0,X[m/2+1:],0]*1j
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x211f00522b0>

The following is another distribution of CitiGroup for the year 2016. In comparison to the year 2008, the spread was smaller.

In [14]:
sns.distplot( returns_df.ix['2016-01-01':'2016-12-31']['C_Return'],\
             bins= 60, color= 'purple')
C:\Users\Diane\Anaconda3\lib\site-packages\statsmodels\nonparametric\kdetools.py:20: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  y = X[:m/2+1] + np.r_[0,X[m/2+1:],0]*1j
Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x211f17737f0>

We will stop at this point and continue In another post.