Matlab Code
WHO WANTS TO BE A MILLIONAIRE?
Joint Distribution and Correlation
The weekly rates of return for five stocks listed on the New York Stock Exchange are given in
the file Stocks.dat. Call these stock column vectors: A, B, C, D and E. Let the data matrix be
X = [A B C D E]. In this assignment, first you will approximate the joint distribution of the pair
of stocks. Then you will find the covariance and correlation between each pair of stocks using
the approximated pdf’s and directly over the data (sample covariance) too. The idea is to find
out which two stocks have the higher correlation and how to use this information for
investment.
Here are the main steps:
1. Load Stocks.DAT into MATLAB
2. Visualize (plot) each of the 5 stocks. OUTPUT: Stock plots.
3. Perform statistical normalization (zero mean, unit variance) of each stock data vector
OUTPUT: Plots of all normalized stocks
4. Approximate the joint pdf for your data. OUTPUT: Two different plots of your joint pdf.
5. Compute the covariance matrix and the correlation coefficient by a) from the joint
pdf.s that are estimated, b) directly from the sample covariance matrix
OUTPUTS: the estimated correlation from the joint pdf.s and estimated correlation
matrix from pdf.s and the sample correlation matrix.
You should create ONLY one technical report (in pdf) containing: comments, discussion,
MATLAB script and all outputs (plots, etc.) of your assignment.
Assignment Details and Programming Hints
Load the data file, Stocks.dat into the data matrix, X, where the columns are the stock vectors.
X = [A B C D E]. Plot each column separetely in a single plot (different color) and observe.
Which stocks would you think that has the highest correlation?
1) Perform statistical normalization: zero mean + unit variance
First perform the MATLAB comment: Z = zscore(X);
This command shifts and scales the column vectors to be zero-mean and unit-variance.
Verify this by using mean() and var() commands. Also plot the columns of the Z matrix and
discuss what you see. Is each column really zero mean and unit variance? Verify.
2) Compute and plot the joint pdf
Recall that we can use histograms to “approximate” the true pdf. However, there is no standard
Matlab command to generate a 2D histogram. Use the following code to compute the 2D
histogram first and then to approximate the joint pdf, say between the 1st and 2nd stocks. Turn
this into a MATLAB function so that you can use it to approximate all other joint pdf.s (1-3, …
, 4-5). This code uses Matlab's hist command to perform most of the real work, and just
repeatedly applies it. The for loop segregates the stocks of X1 by the X1-value bins. Then for
each subset of X1 stocks, it creates a 1-D histogram of the corresponding X2 stocks and puts
them into the appropriate boxes in the 2-D histogram. It requires you to set bins, which is the
number of bins on each axis. I used: bins=16.
bins=16; % my choice.. feel free to change it..
stocks = Z’; % make them row vectors..
% The estimation and plotting of the 2D pdf
% 1. Use hist on all data to find bins and bin sizes.
[n, x1] = hist(stocks (1,:),bins);
[n, x2] = hist(stocks (2,:),bins);
delta_x1 = x1(2)-x1(1);
delta_x2 = x2(2)-x2(1);
% 2. Initialize a 2-D matrix for the 2-D histogram
n2d = zeros(length(x1), length(x2));
% 3. For each row, find the indices of the X_1 stocks
% which fall into that row.
% Compute a histogram for the X_2 stocks, and put it in the 2-D
% histogram for that row.
for i = 1:length(x1),
ind = find((stocks (1,:) > x1(i)-delta_x1/2) &
(stocks(1,:) <= x1(i)+delta_x1/2));
n2d(i,1:length(x2)) = hist(stocks(2,ind), x2);
end
Next, as for the marginal pdf, we need to normalize the histogram so that the joint pdf sums
up to one:
pdf = n2d./(sum(sum(n2d)));
Finally, plot the pdf using one of the following options, so that the pdf is clearly visible.
1. Image Plot: h = imagesc(x1,x2, pdf); colorbar;
2. Surface Plot: h = surf(x1,x2, pdf); view(20,30);
3. Mesh Plot: h = mesh(x1,x2, pdf); view(20,30);
4. Contour Plot: h = contour(x1,x2,pdf); colorbar;
Do not forget to grid and label your plots. Insert and discuss two different joint pdf plots in
your report. Pick whichever two you think best show the features of your pdf.
3) Estimate the Covariance/Correlation from the joint pdf
As you have the approximated joint pdf, now you can now compute the Covariance for any
pair of stocks using the well-known formula:
You can use the following code to implement this summation:
CovXY = 0;
for x=1: bins,
for y=1: bins,
CovXY = CovXY + (x1(x)*x2(y)*pdf(x,y));
end
end
Now as you statistically normalize the data at the beginning, what do you say about
Covariance and Correlation? Repeat steps 2-3 for other stock pairs (1,3), …, (4,5). Which pair
of stocks is giving you the highest correlation? Put all the cross-correlation results into a 5x5
Cxy matrix where the (i,j)th and (j,i)th elements will both take the correlation value between
the ith and jth stock. The diagonal elements should be all 1 (no need for calculation) since the
correlation of a signal with itself is always 1.
4) Calculate sample Covariance/Correlation matrices {Bonus}
Each row of the data matrix S = stocks is a statistically normalized vector. Therefore, the
sample 5x5 covariance matrix Cov(S) can be computed as follows:
𝐶𝑜𝑣(𝐒) = 𝐒𝐒𝑇
𝑁 − 1
where N is the length of each stock data. Can you also find sample correlation matrix, Cor(S)?
Explain how?
Now examine the Cor(S) matrix. What do you see in diagonal elements? Why? Compare the estimated Correlation value between stocks 1 and 2 in the previous section with
the element at (1,2) or (2,1) of the Cor(S) matrix. Are they identical? Why not? Which one is a
better estimate for the cross correlation? Why?
According to the Cor(S) matrix, which pair of stocks have the highest correlation. Is Cor(S)
and Cxy matrices are identical? If not, which one do you think is a better estimate of the cross-
correlation?
Finally, how can you use the “stock correlation” information for investment (to reduce the risks
-OR- maximize the profits) and to become a millionaire? Briefly discuss.