Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

18849 e chandler heights rd

18/10/2021 Client: muhammad11 Deadline: 2 Day

Write A Python Code On The Anaconda Navigator

Resource Information
In this assignment, you should work with books.csv file. This file contains the detailed information about books scraped via the Goodreads . The dataset is downloaded from Kaggle website: https://www.kaggle.com/jealousleopard/goodreadsbooks/downloads/goodreadsbooks.zip/6

Each row in the file includes ten columns. Detailed description for each column is provided in the following:

bookID: A unique Identification number for each book.
title: The name under which the book was published.
authors: Names of the authors of the book. Multiple authors are delimited with -.
average_rating: The average rating of the book received in total.
isbn: Another unique number to identify the book, the International Standard Book Number.
isbn13: A 13-digit ISBN to identify the book, instead of the standard 11-digit ISBN.
language_code: Helps understand what is the primary language of the book.
num_pages: Number of pages the book contains.
ratings_count: Total number of ratings the book received.
text_reviews_count: Total number of written text reviews the book received.
Task
Write the following codes:
Use pandas to read the file as a dataframe (named as books). bookIDcolumn should be the index of the dataframe.
Use books.head() to see the first 5 rows of the dataframe.
Use book.shape to find the number of rows and columns in the dataframe.
Use books.describe() to summarize the data.
Use books['authors'].describe() to find about number of unique authors in the dataset and also most frequent author.
Use OLS regression to test if average rating of a book is dependent to number of pages, number of ratings, and total number of written text reviews the book received.
Summarize your findings in a Word file.
Instructions
Please follow these directions carefully.
Please type your codes in a Jupyter Network file and your summary in a word document named as follows:
HW6YourFirstNameYourLastName.

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Python Basics (Instructor: Dr. Milad Baghersad)\n", "## Module 6: Data Analysis with Python Part 1\n", "\n", "- Reference: McKinney, Wes (2018) Python for data analysis: Data wrangling with Pandas, NumPy, and IPython, Second Edition, O'Reilly Media, Inc. ISBN-13: 978-1491957660 ISBN-10: 1491957662\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "___\n", "___\n", "___\n", "### review: Numpy (https://www.numpy.org/)\n", "NumPy, short for Numerical Python, is one of the most important foundational packages for numerical computing in Python.\n", "\n", "One of the key features of NumPy is its N-dimensional array object, or ndarray, which is a fast, flexible container for large datasets in Python. Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax to the\n", "equivalent operations between scalar elements." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b = np.array([[ 0, 1, 2, 3, 4],\n", " [ 5, 6, 7, 8, 9],\n", " [10, 11, 12, 13, 14]])\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(b)\n", "type(b)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(b.sum(axis=0)) # sum of each column" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.ones( (5,4) )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Create an array of the given shape and populate it with random samples from a uniform distribution over [0, 1)\n", "np.random.rand(4,2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "---\n", "---\n", "# pandas (https://pandas.pydata.org/)\n", "\n", "- Developed by Wes McKinney.\n", "- pandas contains data structures and data manipulation tools designed to make data cleaning and analysis fast and easy in Python.\n", "- While pandas adopts many coding idioms from NumPy, the biggest difference is that pandas is designed for working with tabular or heterogeneous data. \n", "- NumPy, by contrast, is best suited for working with homogeneous numerical array data.\n", "- Can be used to collect data from different sources such as Yahoo Finance!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_data = np.random.rand(4,2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(my_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### change the array to a pandas dataframe:\n", "A DataFrame represents a rectangular table of data and contains an ordered collection of columns, each of which can be a different value type (numeric, string, boolean, etc.)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_data_df = pd.DataFrame(my_data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_data_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(my_data_df)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_data_df.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#assign columns name\n", "my_data_df = pd.DataFrame(my_data,columns=[\"first column\", \"Second column\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_data_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#assign rows name\n", "my_data_df = pd.DataFrame(my_data,columns=[\"first column\", \"Second column\"],index=['a', 'b', 'c', 'd'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_data_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#There are many ways to construct a DataFrame, though one of the most common is\n", "# from a dict of equal-length lists or NumPy arrays:\n", "data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],\n", " 'year': [2000, 2001, 2002, 2001, 2002, 2003],\n", " 'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t = pd.DataFrame(data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#For large DataFrames, the head method selects only the first five rows:\n", "data_t.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t.tail()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t.columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#If you specify a sequence of columns, the DataFrame’s columns will be arranged in that order:\n", "pd.DataFrame(data, columns=['year', 'state', 'pop'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df2 = pd.DataFrame(data, columns=['year', 'state', 'pop'])\n", "df2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df2.set_index('year',inplace=True)\n", "df2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#If you pass a column that isn’t contained in the dict, it will appear with missing values in the result:\n", "data_t2 = pd.DataFrame(data, columns=['year', 'state', 'pop', 'debt'],\n", " index=['one', 'two', 'three', 'four', 'five', 'six'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2.columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#retrieving a column by dict-like notation \n", "data_t2[\"state\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# or by attribute:\n", "data_t2.state" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Rows can be retrieved by position or name with the special loc attribute:\n", "data_t2.loc['three']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Columns can be modified by assignment. \n", "data_t2['debt'] = 16.5" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2['debt'] = np.arange(6.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "val = pd.DataFrame([2, 4, 5],index=['two', 'four', 'five'])\n", "val" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2['debt'] = val" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2['state'] == 'Ohio'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2['eastern'] = data_t2['state'] == 'Ohio'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#The del method can then be used to remove this column:\n", "del data_t2['eastern']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Index objects are immutable and thus can’t be modified by the user:\n", "data_t2.index" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2.index[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2.index[0] = 0 #TypeError" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],\n", " 'year': [2000, 2001, 2002, 2001, 2002, 2003],\n", " 'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data3 = pd.DataFrame(data, index=data[\"year\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data3" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "del data3['year']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data3" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data3.columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "'state' in data3.columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "'state' in data3.index" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "2003 in data3.index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Reindexing:\n", "An important method on pandas objects is reindex, which means to create a new object with the data conformed to a new index. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame = pd.DataFrame(np.arange(9).reshape((3, 3)),\n", " index=['a', 'c', 'd'],\n", " columns=['Ohio', 'Texas', 'California'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame2 = frame.reindex(['a', 'b', 'c','d'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame2.drop('b') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Indexing, Selection, and Filtering:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#column selection\n", "data_t2['year']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#row selection: using either axis labels (loc) \n", "data_t2.loc[\"two\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#row selection: using either axis integers (iloc) \n", "data_t2.iloc[1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2.iloc[0,0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2.loc[\"one\",\"year\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2.iloc[0,0:3]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2.iloc[0,0] = 2010" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2.at[\"two\", \"state\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2.at[\"two\", \"state\"] = \"Florida\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2.loc[\"two\", \"state\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Sorting and Ranking" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame = pd.DataFrame(np.arange(8).reshape((2, 4)),\n", " index=['three', 'one'],\n", " columns=['d', 'a', 'b', 'c'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame.sort_index()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame.sort_index(axis=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame.sort_index().sort_index(axis=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame.sort_index(axis=1, ascending=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame = pd.DataFrame({'rating': [4.3, 5, 1, 2]}, index=['R1','R2','R3','R4'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame.rank(ascending=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame.sort_values(\"rating\", ascending=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Summarizing and Computing Descriptive Statistics with pandas (good for handling missing data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.DataFrame([[1.4, np.nan], [7.1, -4.5],\n", " [np.nan, np.nan], [0.75, -1.3]],\n", " index=['a', 'b', 'c', 'd'],\n", " columns=['one', 'two'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Calling DataFrame’s sum method returns column sums:\n", "df.sum()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.sum(axis='columns')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#note: NA values are excluded.\n", "df.mean(axis='columns')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#This can be disabled with the skipna option:\n", "df.mean(axis='columns', skipna=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### important: describe()\n", "describe provides multiple summary statistics:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### pandas-datareader (https://pandas-datareader.readthedocs.io/en/latest/)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "example: stock prices and volumes obtained from Yahoo! Finance using the add-on pandas-datareader package." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pip install pandas-datareader" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas_datareader.data as web" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "web.get_data_yahoo('AAPL')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "all_data = {ticker: web.get_data_yahoo(ticker) for ticker in ['AAPL', 'IBM', 'MSFT', 'GOOG']}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(all_data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "price = pd.DataFrame({ticker: data['Adj Close'] for ticker, data in all_data.items()})" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "price" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "volume = pd.DataFrame({ticker: data['Volume'] for ticker, data in all_data.items()})" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "volume" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Compute percent changes of the prices\n", "returns = price.pct_change()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "returns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "returns.describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "returns['MSFT'].corr(returns['IBM'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "returns['MSFT'].cov(returns['IBM'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "returns.corr()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "returns.cov()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "returns.corrwith(volume)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: can you make money with buying and selleing APPLE stock by buying at the opening and selling at the closing?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(all_data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "apple_data = pd.DataFrame(all_data[\"AAPL\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "apple_data.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "apple_data['Close-Open'] = apple_data['Close'] - apple_data['Open']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "apple_data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "apple_data.describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib notebook" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "apple_data[\"Close-Open\"].plot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "apple_data.plot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "apple_data[['High','Low']].plot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "apple_data.plot(x='High',y='Low', kind='scatter')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "___\n", "___\n", "___\n", "___\n", "### performing regression using statsmodels library" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import statsmodels.api as sm" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "returns.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y = returns['AAPL']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = returns[['IBM', 'MSFT', 'GOOG']]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = sm.OLS(y,x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = sm.OLS(y,x, missing='drop')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = model.fit()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(result.summary())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "___\n", "___\n", "___\n", "___\n", "### read_html\n", "reads tables in a html address as a list" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list = pd.read_html('https://en.wikipedia.org/wiki/List_of_largest_companies_by_revenue')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(my_list)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(my_list)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df = pd.DataFrame(my_list)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(my_list[0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df = pd.DataFrame(my_list[0])\n", "my_list_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df.set_index('Rank',inplace=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df.describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df['Revenue(USD millions)']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df['Revenue(USD millions)']= my_list_df['Revenue(USD millions)'].replace('[\\$,]', '', regex=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df['Revenue(USD millions)']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df['Revenue(USD millions)']= my_list_df['Revenue(USD millions)'].astype(float)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df['Revenue(USD millions)']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df.describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df['Country'].describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df['Country'].unique()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#filtering companies in United States:\n", "indices = my_list_df['Country']=='United States'\n", "US_companies = my_list_df.loc[indices,:]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "US_companies" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "US_companies.plot(kind='hist',bins=20,alpha=0.8)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "revenue_US = US_companies[\"Revenue(USD millions)\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "revenue_US.plot(kind='hist',bins=30,alpha=0.8)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(revenue_US)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df.groupby(['Country']).count()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df.groupby(['Country']).count().sort_values(\"Name\", ascending=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df.groupby(['Industry']).count().sort_values(\"Name\", ascending=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df.tail()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export data\n", "my_list_df.to_csv(\"BestCompanies.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### Another example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "dealership_data = pd.read_csv(\"dealership.csv\", delimiter=\",\")\n", "dealership_data.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dealership_data.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dealership_data = pd.read_csv(\"dealership.csv\", delimiter=\",\", index_col=0)\n", "dealership_data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dealership_data = pd.read_csv(\"dealership.csv\", delimiter=\",\")\n", "dealership_data.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dealership_data.describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dealership_data['Profit']= dealership_data['Profit'].replace('[\\$,]', '', regex=True).astype('float')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dealership_data.describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dealership_data['Location'].describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dealership_data['Location'].unique()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Question: Are there any statistically significant differences between the means of profits earned in different locations?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#filtering profit based on location:\n", "indices = dealership_data['Location']=='Tionesta'\n", "Tionesta = dealership_data.loc[indices,:]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "indices = dealership_data['Location']=='Sheffield'\n", "Sheffield = dealership_data.loc[indices,:]\n", "indices = dealership_data['Location']=='Kane'\n", "Kane = dealership_data.loc[indices,:]\n", "indices = dealership_data['Location']=='Olean'\n", "Olean = dealership_data.loc[indices,:]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#plot them\n", "%matplotlib inline\n", "Tionesta['Profit'].plot(kind='hist',bins=15,alpha=0.8, color = 'red')\n", "Sheffield['Profit'].plot(kind='hist',bins=15,alpha=0.8, color = 'green')\n", "Kane['Profit'].plot(kind='hist',bins=15,alpha=0.8, color = 'blue')\n", "Olean['Profit'].plot(kind='hist',bins=15,alpha=0.8, color = 'yellow')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dealership_data.boxplot('Profit',by='Location')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import statsmodels.api as sm\n", "from statsmodels.formula.api import ols" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = ols('Profit ~ Location', data = dealership_data).fit()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ANOVA_table = sm.stats.anova_lm(model,typ=2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(ANOVA_table)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

Assignment Helper
Helping Hand
Engineering Solutions
Professional Accountant
Assignments Hut
Engineering Exam Guru
Writer Writer Name Offer Chat
Assignment Helper

ONLINE

Assignment Helper

I reckon that I can perfectly carry this project for you! I am a research writer and have been writing academic papers, business reports, plans, literature review, reports and others for the past 1 decade.

$31 Chat With Writer
Helping Hand

ONLINE

Helping Hand

I have assisted scholars, business persons, startups, entrepreneurs, marketers, managers etc in their, pitches, presentations, market research, business plans etc.

$49 Chat With Writer
Engineering Solutions

ONLINE

Engineering Solutions

I have done dissertations, thesis, reports related to these topics, and I cover all the CHAPTERS accordingly and provide proper updates on the project.

$15 Chat With Writer
Professional Accountant

ONLINE

Professional Accountant

I have assisted scholars, business persons, startups, entrepreneurs, marketers, managers etc in their, pitches, presentations, market research, business plans etc.

$49 Chat With Writer
Assignments Hut

ONLINE

Assignments Hut

I will provide you with the well organized and well research papers from different primary and secondary sources will write the content that will support your points.

$45 Chat With Writer
Engineering Exam Guru

ONLINE

Engineering Exam Guru

I am a PhD writer with 10 years of experience. I will be delivering high-quality, plagiarism-free work to you in the minimum amount of time. Waiting for your message.

$41 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

Two different professors teach an introductory statistics course - WEEK5-APPLY: CONTINGENCY PLAN AND EXECUTIVE PRESENTATION - 5 point star gang sign - Themis university of melbourne - Toyota usa automobile museum - Lab repoert - Movie review - I need help solving - Implementing Protocols and Improving Guest Services - Project team leadership presentation cpmgt 300 - Political lens definition - +91-8306951337 get your love back by vashikaran IN Malegaon - Research types - Speech on the vietnam war 1967 sparknotes - Best language for ecommerce - Class 1a building requirements - Research methods ib psychology - Kitchen gadgets draw a context diagram for the order system - Example of autobiography of students - Biology - Unit 2 Article Review - Science Experiment: Analysis - Balsam spark english assignments - Related and Independent Samples - Is forum shopping ethical busi 301 - Nissan cause and effect diagram - Cotton mather wonders of the invisible world analysis - Persuasive speech topics about veterans - Excel chapter 2 grader project assessment - Legends Essay - Managerial Accounting_Problem3 - Google earth plate boundaries kmz - Pt tata consultancy services indonesia - A mercedes benz 300sl m 1700 kg - Excel add ins data analysis plus - Wk 3, HCS 430: Types of Consent and Payers - Catholic schools office pennant hills - Comparative analysis presentation - Reflection paper project management - Due Monday - Integrate personal and professional ethics to achieve organizational goals. - Feeding the 5000 powerpoint - Fish creek animal hospital case study - BUS 3 - Qualitative Research - Is versace publicly traded - Hybrid theory of language development - Points to consider when choosing a seam - Chapter 3 professional ethics solutions - Exam - Structural formula of 3 3 dimethylpentane - Subscriber line interface circuit - Write a five sentence paragraph using chronological order - List of commedia dell'arte characters - Bsbrsk501 risk management plan - Black dance in america - Please strictly follow the instructions - Text - Sap hana case study pdf - Quality management techniques ppt - BUS-FP3050 Communication, Ethics, And A Command Decision - A company's stated core values and ethical principles are - Leonora barry - Mixed Methods Approach 8381 - Celf 5 following directions - Focused cough assessment shadow health - The research writer van rys pdf - Business Plan - Exhibition budgerigar world magazine - Amusement park map key - 951 ipswich road moorooka - Betasustainability etf units - X microsoft antispam mailbox delivery jmr 1 - Psychology of wearing watch in right hand - Romeo and juliet paper assignment - Trader joe's vitamin and mineral crusade - Building Credibility - Architectural styles in software engineering ppt - Create a scenario summary report excel 2013 - Is uber an ethical company - Education is a never ending process - Opposite song dr jean - When past practice and clear cut contractual language conflict - Labview signal processing tutorial - Elite dental new albany - Amps per hour to watts - What is the mission statement of apple inc - Howl of the werewolf - Bbva compass marketing resource allocation case solution - Acids and bases experiments for middle school - Final Project - Golden flax meal coles - Amath 383 - The odyssey robert fagles summary - BuzzFeed Case Study Analysis & Application - Telstra prepaid mobile broadband plus - What does 1 6 equal in fractions - OBSERVING BIOREACTIONS - Bridging course timetable unsw sydney 20 january - 2 finger virginity test