Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

145 hook creek blvd terminal a7 valley stream ny 11581

18/10/2021 Client: muhammad11 Deadline: 2 Day

Python Project

For the third project, you will be creating a suite of functions to process and analyze a Twitter data set derived from one of the data sets used in the fivethirtyeight article https://fivethirtyeight.com/features/the-worst-tweeter-in-politics-isnt-trump/. The data file you have is a subset and was pre-processed, so do not use the original data.

For the first part of the project, you will write 2 functions and correct 1 function. The second part of the project is to write a short main program that uses your function.

You are provided with two text files, sen_tweets_edited_2.csv (yes, it has a .csv extension, but is also a text file) and test_file.txt, which is a small fille in the format of sen_tweets_edited_2.csv that you can use for testing. You are provided with two Python skeleton files Project_3.py for your functions and Project_3_Main.py for your main program. The other files provided are related to the optional extra credit.

About Twitter data and Tweets

Tweets are submission to the social media platform Twitter. They are 140 characters in length or less. Tweets often contain hashtags which are words or phrases with the prefix #, for example, #avocadotoast. Tweets may reference other Twitter users, as indicated by the @, for example, @realDonaldTrump. Users may re-post a tweet posted by another user – this is referred to as re-tweeting or a retweet. Users may reply directly to a tweet posted by another individual, which is called a reply. Users may also indicate that they like a tweet by making it a favorite.

Each function description has a bulleted list of key points. These will answer many of your questions and provide hints, suggestions, and

smaller subtasks if you do not know how to start or get stuck.

Keep the instructions open on your desktop while you work on the project and refer to them often.

Part One: Processing and analyzing tweets

All of the code for Part 1 should be submitted in the Project_3.py file. You will fill in your functions under the definition lines.

Task 1: process_hashes (tweet)

process_hashes has one required argument: a single string tweet that is a tweet. This function extracts hashtags from the tweet and returns them in a list. The function should work as follows:

• The function takes a string as input and returns a list. Think carefully about when you should convert this string to a list and which steps below are performed on the string, and which on the list. WRITE THIS OUT ON PAPER FIRST!

• All of the text should be put into upper case. • Punctuation should be removed, including hashtags and at signs. Think carefully about what

should get removed when and how hashtags are identified. Specifically, you need to remove, ?.,!_:;#@

• Delete trailing 's on hashtags. This means if the hashtag is: #POTUS's, after processing (including step above) it should be: POTUS

• The function returns a list that contains the hashtags found. This list will be empty if there were no hashtags. The list might contain the same hashtag more than once if it appears in the tweet more than once. (Do not remove duplicates.)

• You should not be reading in from a file anywhere in this function. The input is a string called tweet. Use tweet for processing.

• We haven’t discussed regular expressions. If you use them, you may get none of the points regardless of what the internet tells you to do. (You can do the extra credit with regular expressions.)

Here are some examples, note this is not actual code, and words like “Input tweet” should not print when your function runs. Please also note these tweets were chosen because of the text they contain, they are not a political statement.

Input tweet: ".@realDonaldTrump's #SwampCabinet must be held accountable. I will hold them to account even if @SenateGOP won't. "

process_hashes returns:

['SWAMPCABINET'] Input tweet: 'Why are Republicans asking the Supreme Court to raise taxes on Alaska families? #ACAWorks for #Alaska http://t.co/lHeHUoXRQq'

process_hashes returns:

['ACAWORKS', 'ALASKA']

Task 2: popular_tweets(filename, how = 'retweet', cutoff = 100, counts = False) correction

You have been given code for a function called popular_tweets(filename, how = 'retweet', cutoff = 100, counts = False) which does not work properly. Correct the code to meet the following specifications.

popular_tweets(filename, how = 'retweet', cutoff = 100, counts = False) returns a dictionary where the keys are strings corresponding to Twitter usernames and the values are either 1) a list of tweets (strings) by that user or 2) integers representing a count of tweets by that user. Whether the value is a list or an integer depends on the optional argument counts. More details of how the function works are presented below.

popular_tweets takes one required input filename, a string that is the name of a file containing tab-delimited Twitter data. The file specified by filename should be in the format (the spaces below represent tabs, \t, NOT spaces):

ID tweet_text replies retweets favorites username party state

For example, one line in the file might be:

179162 @DrNordal, it was nice meeting with you. Thanks for stopping by. 1 3 0 SenDeanHeller R NV

The ID is 179162. Next is the actual tweet. 1 is the number of replies. 3 is the number of retweets. 0 is the number of favorites. The username is SenDeanHeller. Party is R (Republican). State is NV (Nevada). You have been given two files in this format: test_file.txt is a small file you may wish to use for testing your code; sen_tweets_edited_2.csv is a larger data set. Pay close attention to what data is in which column.

The optional arguments how and cutoff determine which tweets will be included in the final output dictionary. The function is looking for tweets that are popular based on either how many replies they received, how many times they were retweeted, or how many times they were favorited. For the tweet shown above, there was 1 reply, 3 retweets, and 0 favorites.

The argument how tells whether to determine popularity based on replies, retweets or favorites. It will always be a string with default value 'retweet'. The only other possible values for how are 'reply' and 'favorite'. Each option corresponds to a column in the original data file.

The argument cutoff tells how many replies/retweets/favorites the tweet must have to be included in the output. If a tweet has fewer replies/retweets/favorites than the value of cutoff, it will not appear in the output dictionary. The default value of cutoff is 100.

In the examples below, the original data is the same for all 3 cases, but the output is different. The original data consists of five lines in a file (this data is fabricated but based on real data):

185791 Bump fire stocks allow 88 151 293 SenF D CA

286443 Congratulations to @TeamCoachBuzz 5 30 268 timkaine D VA

25697 I'm grateful for #Arkansas 0 4 4 JohnBoozman R AR

286523 Sea level threatens Hampton Roads 69 473 1819 timkaine D VA

251370 I also stand ready to work 8 11 30 SenShelby R AL

• Popularity based on retweets, how = 'retweet', cutoff = 100; function returns:

{'SenF': ['Bump fire stocks allow'], 'timkaine': ['Sea level threatens Hampton Roads']}

Notice that each key is a username and the values are lists containing tweets that had more than 100 retweets. There are no entries in the dictionary for JohnBoozman (4 retweets) or SenShelby (11 retweets), and timkaine’s first tweet does not appear (only 30 retweets)

• Popularity based on retweets, how = 'retweet', cutoff = 10; function returns:

{'SenF': ['Bump fire stocks allow'], 'timkaine': ['Congratulations to @TeamCoachBuzz', 'Sea level threatens Hampton Roads'], 'SenShelby': ['I also stand ready to work']}

Notice that since the cutoff is lower, timkaine’s first tweet is now included (30 retweets) as well as SenShelby’s tweet (11 retweets)

• Popularity based on replies, how = 'reply', cutoff = 100; function returns:

{}

Notice that the output is an empty dictionary because NONE of the tweets had 100 replies or more.

• Popularity based on replies, how = 'reply', cutoff = 10; function returns:

{'SenF': ['Bump fire stocks allow'], 'timkaine': ['Sea level threatens Hampton Roads']}

Now with a lower cutoff, we get the tweet from SenF (88 replies) and one tweet from timkaine (69 replies).

• Popularity based on favorites, how = 'favorite', cutoff = 100; function returns:

{'SenF': ['Bump fire stocks allow'], 'timkaine': ['Congratulations to @TeamCoachBuzz', 'Sea level threatens Hampton Roads']}

Notice that we get both tweets for timkaine since both had at least 100 favorites. We also get the tweet from SenF which had 293 favorites.

The examples above are all cases where the argument count is equal to False – this produces a dictionary with lists of tweets as values. When count is True, the values in the dictionary are integers that represent how many tweets each user had that met the popularity cutoff. Here are the same examples from above with count = True.

• how = 'retweet', cutoff = 100, count = True; function returns:

{'SenF': 1, 'timkaine': 1}

Notice that the keys are the same as in the example above, but instead of the text of the tweets as values, we have a count of how many tweets there were. You will see this in all of the examples.

• how = 'retweet', cutoff = 10, count = True; function returns:

{'SenF': 1, 'timkaine': 2, 'SenShelby': 1}

• how = 'reply', cutoff = 100, count = True; function returns:

{}

• how = 'reply', cutoff = 10, count = True; function returns:

{'SenF': 1, 'timkaine': 1}

• how = 'favorite', cutoff = 100, count = True; function returns:

{'SenF': 1, 'timkaine': 2}

Debug the code you have been given to make the function worked as described in the specifications. You may not add or delete lines, and you must correct the lines in place without introducing dramatic changes. Plausible changes include switching the position of two lines, adding some code to lines, changing code to use correct variables and/or indices that are incorrect, and changing indentation. You should not be creating new objects or variables, adding lines, deleting lines, or re-writing lines to use completely different structures (e.g. adding list comprehension when there is no list comprehension).

Key points for debugging: (these will make more sense if you look at the code first)

• Use the comments in the code to guide you; use the debugging print lines, but suggest you only use them with test_file.txt, otherwise will print more output than may be useful

• The purpose of the info dictionary is to map each possible value for the parameter how to the appropriate column in the text file. You should ensure this mapping is correct.

• The output of this function is a dictionary with strings as keys and lists of strings (tweets) as values OR integers as values depending on whether count is True or False. Make the dictionary every time with the username mapped to a list of tweets first. Then deal with converting this to counts after it is complete. The information used to make the dictionary comes from the file.

• When working with dictionaries with lists as the values, it is important to determine whether you are creating a new entry in the dictionary OR adding to an existing entry. These are two separate cases that the code needs to deal with.

• You must understand what each and every variable in the function does in order to debug correctly. Randomly guessing at things will make the code worse. It is suggested that you write down each variable name (result, line, file, info, data, how, cutoff, counts, item) and what it stands for BEFORE making any significant changes to the code. For example: result – a dictionary that will store each username mapped to either a list of tweets OR a count of tweets for that user line – a string that is a single line read in from a text file in the format described in the instructions

You fill in the rest!

Task 3: graph_usage(tweets, cutoff = 10)

Write a function called graph_usage(tweets, cutoff = 10) which displays a graph showing how often either hashtags occur in the input data. This function takes one required argument tweets, which is a list of strings, that is a list of tweet strings. For example, tweets could be:

['This is a cool #tweet about #tweets', 'Sometimes people write #tweets on Twitter', 'Hey @username, did you #tweet that?','RT @madeup This person tweeted to @username about #tweets', 'Hey @username and @fakeuser, here is something #new about #tweets']

You MUST use your process_hashes function (see suggestions below) to extract the hashtags/mentions from the data. That will result in hashtags that are in all uppercase and have the # removed. Here is the output graph for the data above:

#tweets appeared 4 times in the list of tweets, #tweet appeared 2 times, and #new appeared once. Notice that on the y-axis of the graph each hashtag is in all caps and there is no #.

If you implement things correctly using what we have discussed you will have to exert minimal effort to generate these plots. They are the natural output of using a FreqDist object from NLTK and calling its plot function. If you are writing many lines of code to try to generate such values you are doing something wrong.

The optional argument cutoff determines how many hashtags or mentions to include. For example, if cutoff is 2, the output graph will only contain the 2 most common hashtags (don’t worry about ties, let NLTK handle that). Here is the plot using the same data with cutoff = 2:

So that you can see what this looks like when there are many hashtags, here is output from a larger data set. This is the output using cutoff = 10:

This is the output using cutoff = 5:

How to implement this function: Read this, it tells you exactly what to do

• The input to this function is a list called tweets. If you are trying to read in from a file anywhere in this function, you are doing it wrong.

• The first step in this function is to extract the hashtags from the text in tweets. Consider that tweets is a list of strings. You need to look at each item in tweets individually and use your process_hashes function to get the hashtags or mentions. Store all of these together in a new list.

• The NLTK functions/objects will automagically generate the plot you need. Make a FreqDist object from your list. Use the plot method, it will accept the cutoff argument directly. If you are not using NLTK, something is wrong.

• The function should return the FreqDist object you made. You will need to write the code to draw the plot, and then the last line of your function should return the FreqDist object.

• If you run in a notebook and don’t see a plot, make sure you have executed: %matplotlib inline, do not put that in your code that you submit – it will crash at the command line

Part 2: Main Program

Your main program will make use of the files of tweets you were given and the functions you wrote in the previous sections. Unless otherwise specified, you MUST use your functions. You should not repeat code from your functions inside of the main program.

You have also been given a helper function called join_tweets. The join_tweets function takes a dictionary as input and returns a list. The input dictionary will have lists of string as values. For example, for the input:

{'SenF': ['Bump fire stocks allow'], 'timkaine': ['Congratulations to @TeamCoachBuzz', 'Sea level threatens Hampton Roads'], 'SenShelby': ['I also stand ready to work']}

The output for join_tweets would be:

['Bump fire stocks allow', 'Congratulations to @TeamCoachBuzz', 'Sea level threatens Hampton Roads', 'I also stand ready to work']

Notice that it took the values from every dictionary item and put them all in one list to produce a list of all of the tweets in the dictionary. This function is correct. Do not change it, but you will need to use it.

Your main program should do the following using the sen_tweets_edited_2.csv file as the input file for popular_tweets. If this file crashes your computer, you may use the test file instead, just indicate that in your write-up.

• Create a dictionary that has the most popular tweets based on being retweeted at least 1000 times. Your dictionary should contain the actual text of the tweets. (Hint: use popular_tweets) Print out ONLY the usernames for this dictionary.

• Plot the 10 most common hashtags from the most popular tweets based on being retweeted at least 1000 times. You will need to process your dictionary using join_tweets first. If you are using your dictionary as input to graph_usage, you are doing it wrong.

• Create a dictionary that has most popular tweets based on have at least 500 replies. Your

dictionary should contain counts of how many tweets for each user, NOT the actual tweets.(Hint: use popular_tweets). Print out ONLY the usernames for this dictionary.

Extra credit opportunities: (Worth up to 3%)

1) Write the process_hashes function using list comprehension

2) Write a function called process_hashes_regex that uses regular expressions. Fill in your function under the def line provided.

3) Re-write the join_tweets function using list comprehension – think very carefully about how to do this. Fill in your function under the def line provided for join_tweets_lc.

SUBMISSION EXPECTATIONS

Project_3.py Your implementations/correction of the functions in Part 1. This MUST include the code given to you for join_tweets. It may also contain any extra credit you completed.

Project_3_main.py Your main program for Part 3. The first line in this file should be:

from Project_3 import *

Project_3.pdf A PDF document containing your reflections on the project including any extra credit opportunities you chose to pursue. You must also cite any sources you use. Please be aware that you can consult sources, but all code written must be your own. Programs copied in part or wholesale from the web or other sources will result in reporting of an Honor Code violation.

SUGGESTED COMPLETION SCHEDULE

This describes when to start each task to have a complete checkpoint submission.

After lecture 10/9 or 10/10: process_hashes and first task of main program

After lecture 10/16 or 10/17: popular_tweets and tasks 2 and 4 of main program

After lecture 10/18 or 10/20: graph_usage and task 3 of main program

POINT VALUES AND GRADING RUBRIC

-process_hashes (27.5 points)

-extract_tweets correction (30 pts)

-graph_usage (27.5 pts)

-Main program (12.5 pts)

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

Accounting & Finance Specialist
Coursework Help Online
Ideas & Innovations
Engineering Mentor
Smart Accountants
Top Rated Expert
Writer Writer Name Offer Chat
Accounting & Finance Specialist

ONLINE

Accounting & Finance Specialist

Being a Ph.D. in the Business field, I have been doing academic writing for the past 7 years and have a good command over writing research papers, essay, dissertations and all kinds of academic writing and proofreading.

$29 Chat With Writer
Coursework Help Online

ONLINE

Coursework Help Online

I have assisted scholars, business persons, startups, entrepreneurs, marketers, managers etc in their, pitches, presentations, market research, business plans etc.

$30 Chat With Writer
Ideas & Innovations

ONLINE

Ideas & Innovations

After reading your project details, I feel myself as the best option for you to fulfill this project with 100 percent perfection.

$45 Chat With Writer
Engineering Mentor

ONLINE

Engineering Mentor

Being a Ph.D. in the Business field, I have been doing academic writing for the past 7 years and have a good command over writing research papers, essay, dissertations and all kinds of academic writing and proofreading.

$27 Chat With Writer
Smart Accountants

ONLINE

Smart Accountants

I am a PhD writer with 10 years of experience. I will be delivering high-quality, plagiarism-free work to you in the minimum amount of time. Waiting for your message.

$41 Chat With Writer
Top Rated Expert

ONLINE

Top Rated Expert

I am a professional and experienced writer and I have written research reports, proposals, essays, thesis and dissertations on a variety of topics.

$26 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

Are children who create imaginary companions psychologically disturbed - Bonaire pyrox spare parts - Josephine paterson and loretta zderad - How to draw an isometric circle - Define attentional blink - For anyone - A nonequity strategic alliance exists when - Saudi arabian cultural mission - Eating alone li young lee - The death of the hired man quotes - Appearance vs reality macbeth quotes - CHCCCS011 - Meet personal support needs - Anz capital notes 5 - Journal Article Summaries - Ac and dc resistance of a diode - Medicine man viewing guide answers - Valley vet supply 800-468-0059 ks - Analyzing Quality Data - Burning characteristics of cotton - Why you should donate blood persuasive speech outline - The advantage of vernier caliper over micrometer is that it - Raptor paint drying time - Project Stage 2 - Lishman unit maudsley hospital - Cloud computing - Ways of conquest by denise levertov analysis - Hpm sensor light reset - Daily health declaration rmit - Fusion 360 warning lifting feed plane to topmost ramp - Dr chen harn chin - Earth's population is about billion suppose - Assume the speed of vehicles along a stretch of i-10 has an approximately - Melbourne tram map with streets - Sexual Tolerance - Pope john xxiii contributions - Experiment 25 calorimetry lab report - What determines whether or not a resource is scarce - Access chapter 3 homework project 1 - Benchmark gospel essentials - Business lawyers perth wa - Hudek inc a manufacturing corporation - Contentlaunch ple platoweb - How to avoid static electricity at gas station - Website Content Strategy - Managment - Short paper due Sunday 9.13.2020 @ 12Noon EST - Avacta share price graph - Ordinary language in literature - 10 percent rule energy pyramid worksheet - Entrepreneurship - Engineering ethics concepts and cases 4th ed pdf - Colgate palmolive promotion strategy - Studentweb box hill tafe - Explain the concept of statistical quality control - Professional digital portfolio. - Kleenheat gas hot water systems - Modern database management 13th edition - Community teaching plan teaching experience paper gcu - Q and A - How to correct inventory errors - Lewinian experiential learning model - Chapter 1 foundations of information systems in business - Suppose the following dna sequence was mutated from - The Application of Data to Problem-Solving - Higgins 5 factor model - Logbook format for laboratory - Vintage fun reproduces old fashioned style roller skates and skateboards - Designing team and team identity in team management - OWASP - Fail Securely concept - Heather mcdonald cartersville autopsy - Cwv 101 benchmark gospel essentials essay - Algebra 2 - Preparation of cinnamic acid from benzaldehyde - CIS 348 Assignment 1: Privacy, Laws, and Security Measures - The boy named crow characters - DB Board - Need chegg tutor account - Competent leader award application - Partnership for aflatoxin control in africa - Catton grove primary school - Wilderdom intelligence iq what scores mean - Pablo picasso melting clocks - APA - Nuffield orthopaedic centre rheumatology - Economists assume that monopolists behave as - Fin decisions - Leadership Skills - charismatic leadership - Words ending with ust - Discussion - Freight equalization pros and cons - Ford motor company case study analysis - Internal environment of an organization - Cost of capital case study - Cause-and-Effect Essay - CLC - Evidence-Based Practice Project: Intervention Presentation on Diabetes - The thief by ruskin bond question answers - Vj roofing services pty ltd - Database design language dbdl is a programming language - How to find planned investment spending - Research Paper