Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

15241 beeding street charlotte nc

18/10/2021 Client: muhammad11 Deadline: 2 Day

Write A Python Code On The Anaconda Navigator

Resource Information
In this assignment, you should work with books.csv file. This file contains the detailed information about books scraped via the Goodreads . The dataset is downloaded from Kaggle website: https://www.kaggle.com/jealousleopard/goodreadsbooks/downloads/goodreadsbooks.zip/6

Each row in the file includes ten columns. Detailed description for each column is provided in the following:

bookID: A unique Identification number for each book.
title: The name under which the book was published.
authors: Names of the authors of the book. Multiple authors are delimited with -.
average_rating: The average rating of the book received in total.
isbn: Another unique number to identify the book, the International Standard Book Number.
isbn13: A 13-digit ISBN to identify the book, instead of the standard 11-digit ISBN.
language_code: Helps understand what is the primary language of the book.
num_pages: Number of pages the book contains.
ratings_count: Total number of ratings the book received.
text_reviews_count: Total number of written text reviews the book received.
Task
Write the following codes:
Use pandas to read the file as a dataframe (named as books). bookIDcolumn should be the index of the dataframe.
Use books.head() to see the first 5 rows of the dataframe.
Use book.shape to find the number of rows and columns in the dataframe.
Use books.describe() to summarize the data.
Use books['authors'].describe() to find about number of unique authors in the dataset and also most frequent author.
Use OLS regression to test if average rating of a book is dependent to number of pages, number of ratings, and total number of written text reviews the book received.
Summarize your findings in a Word file.
Instructions
Please follow these directions carefully.
Please type your codes in a Jupyter Network file and your summary in a word document named as follows:
HW6YourFirstNameYourLastName.

Python, is one of the most important foundational packages for numerical computing in Python.\n", "\n", "One of the key features of NumPy is its N-dimensional array object, or ndarray, which is a fast, flexible container for large datasets in Python. Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax to the\n", "equivalent operations between scalar elements." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b = np.array([[ 0, 1, 2, 3, 4],\n", " [ 5, 6, 7, 8, 9],\n", " [10, 11, 12, 13, 14]])\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(b)\n", "type(b)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(b.sum(axis=0)) # sum of each column" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.ones( (5,4) )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Create an array of the given shape and populate it with random samples from a uniform distribution over [0, 1)\n", "np.random.rand(4,2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "---\n", "---\n", "# pandas (https://pandas.pydata.org/)\n", "\n", "- Developed by Wes McKinney.\n", "- pandas contains data structures and data manipulation tools designed to make data cleaning and analysis fast and easy in Python.\n", "- While pandas adopts many coding idioms from NumPy, the biggest difference is that pandas is designed for working with tabular or heterogeneous data. \n", "- NumPy, by contrast, is best suited for working with homogeneous numerical array data.\n", "- Can be used to collect data from different sources such as Yahoo Finance!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_data = np.random.rand(4,2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(my_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### change the array to a pandas dataframe:\n", "A DataFrame represents a rectangular table of data and contains an ordered collection of columns, each of which can be a different value type (numeric, string, boolean, etc.)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_data_df = pd.DataFrame(my_data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_data_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(my_data_df)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_data_df.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#assign columns name\n", "my_data_df = pd.DataFrame(my_data,columns=[\"first column\", \"Second column\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_data_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#assign rows name\n", "my_data_df = pd.DataFrame(my_data,columns=[\"first column\", \"Second column\"],index=['a', 'b', 'c', 'd'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_data_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#There are many ways to construct a DataFrame, though one of the most common is\n", "# from a dict of equal-length lists or NumPy arrays:\n", "data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],\n", " 'year': [2000, 2001, 2002, 2001, 2002, 2003],\n", " 'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t = pd.DataFrame(data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#For large DataFrames, the head method selects only the first five rows:\n", "data_t.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t.tail()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t.columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#If you specify a sequence of columns, the DataFrame’s columns will be arranged in that order:\n", "pd.DataFrame(data, columns=['year', 'state', 'pop'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df2 = pd.DataFrame(data, columns=['year', 'state', 'pop'])\n", "df2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df2.set_index('year',inplace=True)\n", "df2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#If you pass a column that isn’t contained in the dict, it will appear with missing values in the result:\n", "data_t2 = pd.DataFrame(data, columns=['year', 'state', 'pop', 'debt'],\n", " index=['one', 'two', 'three', 'four', 'five', 'six'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2.columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#retrieving a column by dict-like notation \n", "data_t2[\"state\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# or by attribute:\n", "data_t2.state" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Rows can be retrieved by position or name with the special loc attribute:\n", "data_t2.loc['three']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Columns can be modified by assignment. \n", "data_t2['debt'] = 16.5" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2['debt'] = np.arange(6.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "val = pd.DataFrame([2, 4, 5],index=['two', 'four', 'five'])\n", "val" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2['debt'] = val" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2['state'] == 'Ohio'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2['eastern'] = data_t2['state'] == 'Ohio'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#The del method can then be used to remove this column:\n", "del data_t2['eastern']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Index objects are immutable and thus can’t be modified by the user:\n", "data_t2.index" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2.index[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2.index[0] = 0 #TypeError" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],\n", " 'year': [2000, 2001, 2002, 2001, 2002, 2003],\n", " 'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data3 = pd.DataFrame(data, index=data[\"year\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data3" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "del data3['year']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data3" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data3.columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "'state' in data3.columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "'state' in data3.index" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "2003 in data3.index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Reindexing:\n", "An important method on pandas objects is reindex, which means to create a new object with the data conformed to a new index. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame = pd.DataFrame(np.arange(9).reshape((3, 3)),\n", " index=['a', 'c', 'd'],\n", " columns=['Ohio', 'Texas', 'California'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame2 = frame.reindex(['a', 'b', 'c','d'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame2.drop('b') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Indexing, Selection, and Filtering:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#column selection\n", "data_t2['year']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#row selection: using either axis labels (loc) \n", "data_t2.loc[\"two\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#row selection: using either axis integers (iloc) \n", "data_t2.iloc[1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2.iloc[0,0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2.loc[\"one\",\"year\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2.iloc[0,0:3]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2.iloc[0,0] = 2010" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2.at[\"two\", \"state\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2.at[\"two\", \"state\"] = \"Florida\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_t2.loc[\"two\", \"state\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Sorting and Ranking" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame = pd.DataFrame(np.arange(8).reshape((2, 4)),\n", " index=['three', 'one'],\n", " columns=['d', 'a', 'b', 'c'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame.sort_index()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame.sort_index(axis=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame.sort_index().sort_index(axis=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame.sort_index(axis=1, ascending=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame = pd.DataFrame({'rating': [4.3, 5, 1, 2]}, index=['R1','R2','R3','R4'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame.rank(ascending=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame.sort_values(\"rating\", ascending=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Summarizing and Computing Descriptive Statistics with pandas (good for handling missing data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.DataFrame([[1.4, np.nan], [7.1, -4.5],\n", " [np.nan, np.nan], [0.75, -1.3]],\n", " index=['a', 'b', 'c', 'd'],\n", " columns=['one', 'two'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Calling DataFrame’s sum method returns column sums:\n", "df.sum()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.sum(axis='columns')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#note: NA values are excluded.\n", "df.mean(axis='columns')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#This can be disabled with the skipna option:\n", "df.mean(axis='columns', skipna=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### important: describe()\n", "describe provides multiple summary statistics:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### pandas-datareader (https://pandas-datareader.readthedocs.io/en/latest/)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "example: stock prices and volumes obtained from Yahoo! Finance using the add-on pandas-datareader package." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pip install pandas-datareader" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas_datareader.data as web" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "web.get_data_yahoo('AAPL')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "all_data = {ticker: web.get_data_yahoo(ticker) for ticker in ['AAPL', 'IBM', 'MSFT', 'GOOG']}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(all_data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "price = pd.DataFrame({ticker: data['Adj Close'] for ticker, data in all_data.items()})" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "price" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "volume = pd.DataFrame({ticker: data['Volume'] for ticker, data in all_data.items()})" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "volume" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Compute percent changes of the prices\n", "returns = price.pct_change()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "returns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "returns.describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "returns['MSFT'].corr(returns['IBM'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "returns['MSFT'].cov(returns['IBM'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "returns.corr()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "returns.cov()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "returns.corrwith(volume)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: can you make money with buying and selleing APPLE stock by buying at the opening and selling at the closing?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(all_data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "apple_data = pd.DataFrame(all_data[\"AAPL\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "apple_data.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "apple_data['Close-Open'] = apple_data['Close'] - apple_data['Open']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "apple_data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "apple_data.describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib notebook" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "apple_data[\"Close-Open\"].plot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "apple_data.plot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "apple_data[['High','Low']].plot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "apple_data.plot(x='High',y='Low', kind='scatter')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "___\n", "___\n", "___\n", "___\n", "### performing regression using statsmodels library" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import statsmodels.api as sm" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "returns.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y = returns['AAPL']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = returns[['IBM', 'MSFT', 'GOOG']]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = sm.OLS(y,x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = sm.OLS(y,x, missing='drop')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = model.fit()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(result.summary())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "___\n", "___\n", "___\n", "___\n", "### read_html\n", "reads tables in a html address as a list" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list = pd.read_html('https://en.wikipedia.org/wiki/List_of_largest_companies_by_revenue')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(my_list)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(my_list)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df = pd.DataFrame(my_list)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(my_list[0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df = pd.DataFrame(my_list[0])\n", "my_list_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df.set_index('Rank',inplace=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df.describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df['Revenue(USD millions)']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df['Revenue(USD millions)']= my_list_df['Revenue(USD millions)'].replace('[\\$,]', '', regex=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df['Revenue(USD millions)']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df['Revenue(USD millions)']= my_list_df['Revenue(USD millions)'].astype(float)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df['Revenue(USD millions)']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df.describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df['Country'].describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df['Country'].unique()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#filtering companies in United States:\n", "indices = my_list_df['Country']=='United States'\n", "US_companies = my_list_df.loc[indices,:]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "US_companies" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "US_companies.plot(kind='hist',bins=20,alpha=0.8)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "revenue_US = US_companies[\"Revenue(USD millions)\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "revenue_US.plot(kind='hist',bins=30,alpha=0.8)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(revenue_US)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df.groupby(['Country']).count()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df.groupby(['Country']).count().sort_values(\"Name\", ascending=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df.groupby(['Industry']).count().sort_values(\"Name\", ascending=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list_df.tail()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export data\n", "my_list_df.to_csv(\"BestCompanies.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### Another example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "dealership_data = pd.read_csv(\"dealership.csv\", delimiter=\",\")\n", "dealership_data.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dealership_data.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dealership_data = pd.read_csv(\"dealership.csv\", delimiter=\",\", index_col=0)\n", "dealership_data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dealership_data = pd.read_csv(\"dealership.csv\", delimiter=\",\")\n", "dealership_data.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dealership_data.describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dealership_data['Profit']= dealership_data['Profit'].replace('[\\$,]', '', regex=True).astype('float')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dealership_data.describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dealership_data['Location'].describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dealership_data['Location'].unique()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Question: Are there any statistically significant differences between the means of profits earned in different locations?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#filtering profit based on location:\n", "indices = dealership_data['Location']=='Tionesta'\n", "Tionesta = dealership_data.loc[indices,:]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "indices = dealership_data['Location']=='Sheffield'\n", "Sheffield = dealership_data.loc[indices,:]\n", "indices = dealership_data['Location']=='Kane'\n", "Kane = dealership_data.loc[indices,:]\n", "indices = dealership_data['Location']=='Olean'\n", "Olean = dealership_data.loc[indices,:]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#plot them\n", "%matplotlib inline\n", "Tionesta['Profit'].plot(kind='hist',bins=15,alpha=0.8, color = 'red')\n", "Sheffield['Profit'].plot(kind='hist',bins=15,alpha=0.8, color = 'green')\n", "Kane['Profit'].plot(kind='hist',bins=15,alpha=0.8, color = 'blue')\n", "Olean['Profit'].plot(kind='hist',bins=15,alpha=0.8, color = 'yellow')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dealership_data.boxplot('Profit',by='Location')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import statsmodels.api as sm\n", "from statsmodels.formula.api import ols" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = ols('Profit ~ Location', data = dealership_data).fit()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ANOVA_table = sm.stats.anova_lm(model,typ=2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(ANOVA_table)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

Helping Hand
Unique Academic Solutions
Ideas & Innovations
Top Grade Essay
Phd Writer
Study Master
Writer Writer Name Offer Chat
Helping Hand

ONLINE

Helping Hand

I am an experienced researcher here with master education. After reading your posting, I feel, you need an expert research writer to complete your project.Thank You

$49 Chat With Writer
Unique Academic Solutions

ONLINE

Unique Academic Solutions

I am a professional and experienced writer and I have written research reports, proposals, essays, thesis and dissertations on a variety of topics.

$49 Chat With Writer
Ideas & Innovations

ONLINE

Ideas & Innovations

I am a professional and experienced writer and I have written research reports, proposals, essays, thesis and dissertations on a variety of topics.

$21 Chat With Writer
Top Grade Essay

ONLINE

Top Grade Essay

As an experienced writer, I have extensive experience in business writing, report writing, business profile writing, writing business reports and business plans for my clients.

$45 Chat With Writer
Phd Writer

ONLINE

Phd Writer

I am an experienced researcher here with master education. After reading your posting, I feel, you need an expert research writer to complete your project.Thank You

$31 Chat With Writer
Study Master

ONLINE

Study Master

I can assist you in plagiarism free writing as I have already done several related projects of writing. I have a master qualification with 5 years’ experience in; Essay Writing, Case Study Writing, Report Writing.

$31 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

Gregg shields rv haulers - Chemical principles the quest for insight pdf freechemical principles the quest for insight pdf freechemical principles the quest for insight pdf freechemical principles the quest for insight pdf freechemical principles the quest for insight pdf freechemical principles the quest for insight pdf free - Self management plan - Modulus 11 check digit - How to create an electronic vision board - I need help with a writing assignment in Managerial 301 - Monthly retirement planning 1 2 answers - Multisim 14.0 - Where to watch jetix shows - Swift mt103 field 72 - Quilcene oysteria farms and sells oysters - Journal Entry - Discuss the impact of cloud processing environments on application security. Be sure to define cloud computing, application security, and the various types of implications. - Case study J.M Smucker - Album press release template - Discussion and Activity-3 - Enron scandal powerpoint - Writing - Nhs tayside nurse bank - Grapes of wrath chapter 3 analysis - After apple picking poem - Adjective verbs nouns pronouns adverbs - Https training3 ntt sprint com - Child maintenance law pakistan - Custom made cupcake boxes - Six rules to follow when picking stocks - Objective and interpretive communication theories - Discussion - Power point - Mcgraw hill connect plus - Research proposal and paper - Pick one of the following terms for your research: Moral philosophy, justice, white-collar crime, differential association, or power. - Derek bok protecting freedom of expression on the campus analysis - Work capability assessment outcomes - Perth zoo annual report - Cloud computing - Accounting information systems controls and processes 3rd edition test bank - Interview - Interpersonal movie paper - John peter russell australian impressionist - Bomb calorimeter experiment lab report - Michael jordan's impact on society - 7 de laeter way bentley wa 6102 - PICOT formation - 4 questions on 4 philosophies of punishment - NURSING: EVIDENCED-BASED PRACTICE PROJECT - How to make an observer minecraft - Mental status exam social work - Which of the following describes someone's expected outcome from investing - Pestel analysis hairdressing industry - Bridgewater bay to sorrento walk - Staples organizational structure - Living religions mary pat fisher 10th edition pdf - What deacon was responsible for taking the gospel to samaria - Target market for lemonade - Https owl english purdue edu owl resource 606 01 - I need 600 words on Human Resource Management 6011 - Click and learn csi wildlife - Cryptography - Paul sees believers as having three chief duties. one of these duties is towards the - Rieke packaging systems englass - Effective practices for managers and supervisors - Mkt 100 assignment 2 situation analysis - Health and wellbeing responsibility of all poster - Helen on eighty sixth street sparknotes - Poli 330 week 7 quiz - ME - Main - Week 6 - Homework 1 & 2 - Fish dissection lab worksheet - Why chinese mothers are superior the wall street journal - Reading theory - Piggy eulogy lord of the flies - State v jantzi case brief - Raf pay scales 20 21 - Moroccan oil 25ml boots - Shadow health focused exam abdominal pain esther park - What are mathematical observations - Silas marner study questions and answers - Qvc motor trend windshield cover - Edu 540 week 11 - Cloud Computing - Research Paper - Themes in summer of the seventeenth doll - 3 5 richmond road westbourne park - Amy ( 12 pages due by 24th OCT) - Professional development goals for nurse practitioners - Piwik 300 words minimum - 9 isomers of heptane - Arcair k3000 parts breakdown - Cognitive behavioral therapy test questions - Understanding puberty worksheet answers - Shrm case study with solution - U pick cherries melbourne - Descriptive research methods worksheet - Organ Donor - Self assessment paper for psychology - Physics lab #5 due today only physics expert contact me - Build the management research question hierarchy - Banyule city council ceo - Diss6 - The moon be still as bright - Grandma’s Tomato Farm