Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Roc flx craft beverage trail passport

21/10/2021 Client: muhammad11 Deadline: 2 Day

Python Project

CSCI 140 Project 3

Checkpoint deadline Wednesday, October 23 by 1700

Project Due Sunday, October 27 by 1700

For the third project, you will be creating a suite of functions to process and analyze a Twitter data set derived from one of the data sets used in the fivethirtyeight article https://fivethirtyeight.com/features/the-worst-tweeter-in-politics-isnt-trump/. The data file you have is a subset and was pre-processed, so do not use the original data.

For the first part of the project, you will write 2 functions and correct 1 function. The second part of the project is to write a short main program that uses your function.

You are provided with two text files, sen_tweets_edited_2.csv (yes, it has a .csv extension, but is also a text file) and test_file.txt, which is a small fille in the format of sen_tweets_edited_2.csv that you can use for testing. You are provided with two Python skeleton files Project_3.py for your functions and Project_3_Main.py for your main program. The other files provided are related to the optional extra credit.

About Twitter data and Tweets

Tweets are submission to the social media platform Twitter. They are 140 characters in length or less. Tweets often contain hashtags which are words or phrases with the prefix #, for example, #avocadotoast. Tweets may reference other Twitter users, as indicated by the @, for example, @realDonaldTrump. Users may re-post a tweet posted by another user – this is referred to as re-tweeting or a retweet. Users may reply directly to a tweet posted by another individual, which is called a reply. Users may also indicate that they like a tweet by making it a favorite.

Each function description has a bulleted list of key points. These will answer many of your questions and provide hints, suggestions, and

smaller subtasks if you do not know how to start or get stuck.

Keep the instructions open on your desktop while you work on the project and refer to them often.

Part One: Processing and analyzing tweets

All of the code for Part 1 should be submitted in the Project_3.py file. You will fill in your functions under the definition lines.

Task 1: process_hashes (tweet)

process_hashes has one required argument: a single string tweet that is a tweet. This function extracts hashtags from the tweet and returns them in a list. The function should work as follows:

• The function takes a string as input and returns a list. Think carefully about when you should convert this string to a list and which steps below are performed on the string, and which on the list. WRITE THIS OUT ON PAPER FIRST!

• All of the text should be put into upper case. • Punctuation should be removed, including hashtags and at signs. Think carefully about what

should get removed when and how hashtags are identified. Specifically, you need to remove, ?.,!_:;#@

• Delete trailing 's on hashtags. This means if the hashtag is: #POTUS's, after processing (including step above) it should be: POTUS

• The function returns a list that contains the hashtags found. This list will be empty if there were no hashtags. The list might contain the same hashtag more than once if it appears in the tweet more than once. (Do not remove duplicates.)

• You should not be reading in from a file anywhere in this function. The input is a string called tweet. Use tweet for processing.

• We haven’t discussed regular expressions. If you use them, you may get none of the points regardless of what the internet tells you to do. (You can do the extra credit with regular expressions.)

Here are some examples, note this is not actual code, and words like “Input tweet” should not print when your function runs. Please also note these tweets were chosen because of the text they contain, they are not a political statement.

Input tweet: ".@realDonaldTrump's #SwampCabinet must be held accountable. I will hold them to account even if @SenateGOP won't. "

process_hashes returns:

['SWAMPCABINET'] Input tweet: 'Why are Republicans asking the Supreme Court to raise taxes on Alaska families? #ACAWorks for #Alaska http://t.co/lHeHUoXRQq'

process_hashes returns:

['ACAWORKS', 'ALASKA']

Task 2: popular_tweets(filename, how = 'retweet', cutoff = 100, counts = False) correction

You have been given code for a function called popular_tweets(filename, how = 'retweet', cutoff = 100, counts = False) which does not work properly. Correct the code to meet the following specifications.

popular_tweets(filename, how = 'retweet', cutoff = 100, counts = False) returns a dictionary where the keys are strings corresponding to Twitter usernames and the values are either 1) a list of tweets (strings) by that user or 2) integers representing a count of tweets by that user. Whether the value is a list or an integer depends on the optional argument counts. More details of how the function works are presented below.

popular_tweets takes one required input filename, a string that is the name of a file containing tab-delimited Twitter data. The file specified by filename should be in the format (the spaces below represent tabs, \t, NOT spaces):

ID tweet_text replies retweets favorites username party state

For example, one line in the file might be:

179162 @DrNordal, it was nice meeting with you. Thanks for stopping by. 1 3 0 SenDeanHeller R NV

The ID is 179162. Next is the actual tweet. 1 is the number of replies. 3 is the number of retweets. 0 is the number of favorites. The username is SenDeanHeller. Party is R (Republican). State is NV (Nevada). You have been given two files in this format: test_file.txt is a small file you may wish to use for testing your code; sen_tweets_edited_2.csv is a larger data set. Pay close attention to what data is in which column.

The optional arguments how and cutoff determine which tweets will be included in the final output dictionary. The function is looking for tweets that are popular based on either how many replies they received, how many times they were retweeted, or how many times they were favorited. For the tweet shown above, there was 1 reply, 3 retweets, and 0 favorites.

The argument how tells whether to determine popularity based on replies, retweets or favorites. It will always be a string with default value 'retweet'. The only other possible values for how are 'reply' and 'favorite'. Each option corresponds to a column in the original data file.

The argument cutoff tells how many replies/retweets/favorites the tweet must have to be included in the output. If a tweet has fewer replies/retweets/favorites than the value of cutoff, it will not appear in the output dictionary. The default value of cutoff is 100.

In the examples below, the original data is the same for all 3 cases, but the output is different. The original data consists of five lines in a file (this data is fabricated but based on real data):

185791 Bump fire stocks allow 88 151 293 SenF D CA

286443 Congratulations to @TeamCoachBuzz 5 30 268 timkaine D VA

25697 I'm grateful for #Arkansas 0 4 4 JohnBoozman R AR

286523 Sea level threatens Hampton Roads 69 473 1819 timkaine D VA

251370 I also stand ready to work 8 11 30 SenShelby R AL

• Popularity based on retweets, how = 'retweet', cutoff = 100; function returns:

{'SenF': ['Bump fire stocks allow'], 'timkaine': ['Sea level threatens Hampton Roads']}

Notice that each key is a username and the values are lists containing tweets that had more than 100 retweets. There are no entries in the dictionary for JohnBoozman (4 retweets) or SenShelby (11 retweets), and timkaine’s first tweet does not appear (only 30 retweets)

• Popularity based on retweets, how = 'retweet', cutoff = 10; function returns:

{'SenF': ['Bump fire stocks allow'], 'timkaine': ['Congratulations to @TeamCoachBuzz', 'Sea level threatens Hampton Roads'], 'SenShelby': ['I also stand ready to work']}

Notice that since the cutoff is lower, timkaine’s first tweet is now included (30 retweets) as well as SenShelby’s tweet (11 retweets)

• Popularity based on replies, how = 'reply', cutoff = 100; function returns:

{}

Notice that the output is an empty dictionary because NONE of the tweets had 100 replies or more.

• Popularity based on replies, how = 'reply', cutoff = 10; function returns:

{'SenF': ['Bump fire stocks allow'], 'timkaine': ['Sea level threatens Hampton Roads']}

Now with a lower cutoff, we get the tweet from SenF (88 replies) and one tweet from timkaine (69 replies).

• Popularity based on favorites, how = 'favorite', cutoff = 100; function returns:

{'SenF': ['Bump fire stocks allow'], 'timkaine': ['Congratulations to @TeamCoachBuzz', 'Sea level threatens Hampton Roads']}

Notice that we get both tweets for timkaine since both had at least 100 favorites. We also get the tweet from SenF which had 293 favorites.

The examples above are all cases where the argument count is equal to False – this produces a dictionary with lists of tweets as values. When count is True, the values in the dictionary are integers that represent how many tweets each user had that met the popularity cutoff. Here are the same examples from above with count = True.

• how = 'retweet', cutoff = 100, count = True; function returns:

{'SenF': 1, 'timkaine': 1}

Notice that the keys are the same as in the example above, but instead of the text of the tweets as values, we have a count of how many tweets there were. You will see this in all of the examples.

• how = 'retweet', cutoff = 10, count = True; function returns:

{'SenF': 1, 'timkaine': 2, 'SenShelby': 1}

• how = 'reply', cutoff = 100, count = True; function returns:

{}

• how = 'reply', cutoff = 10, count = True; function returns:

{'SenF': 1, 'timkaine': 1}

• how = 'favorite', cutoff = 100, count = True; function returns:

{'SenF': 1, 'timkaine': 2}

Debug the code you have been given to make the function worked as described in the specifications. You may not add or delete lines, and you must correct the lines in place without introducing dramatic changes. Plausible changes include switching the position of two lines, adding some code to lines, changing code to use correct variables and/or indices that are incorrect, and changing indentation. You should not be creating new objects or variables, adding lines, deleting lines, or re-writing lines to use completely different structures (e.g. adding list comprehension when there is no list comprehension).

Key points for debugging: (these will make more sense if you look at the code first)

• Use the comments in the code to guide you; use the debugging print lines, but suggest you only use them with test_file.txt, otherwise will print more output than may be useful

• The purpose of the info dictionary is to map each possible value for the parameter how to the appropriate column in the text file. You should ensure this mapping is correct.

• The output of this function is a dictionary with strings as keys and lists of strings (tweets) as values OR integers as values depending on whether count is True or False. Make the dictionary every time with the username mapped to a list of tweets first. Then deal with converting this to counts after it is complete. The information used to make the dictionary comes from the file.

• When working with dictionaries with lists as the values, it is important to determine whether you are creating a new entry in the dictionary OR adding to an existing entry. These are two separate cases that the code needs to deal with.

• You must understand what each and every variable in the function does in order to debug correctly. Randomly guessing at things will make the code worse. It is suggested that you write down each variable name (result, line, file, info, data, how, cutoff, counts, item) and what it stands for BEFORE making any significant changes to the code. For example: result – a dictionary that will store each username mapped to either a list of tweets OR a count of tweets for that user line – a string that is a single line read in from a text file in the format described in the instructions

You fill in the rest!

Task 3: graph_usage(tweets, cutoff = 10)

Write a function called graph_usage(tweets, cutoff = 10) which displays a graph showing how often either hashtags occur in the input data. This function takes one required argument tweets, which is a list of strings, that is a list of tweet strings. For example, tweets could be:

['This is a cool #tweet about #tweets', 'Sometimes people write #tweets on Twitter', 'Hey @username, did you #tweet that?','RT @madeup This person tweeted to @username about #tweets', 'Hey @username and @fakeuser, here is something #new about #tweets']

You MUST use your process_hashes function (see suggestions below) to extract the hashtags/mentions from the data. That will result in hashtags that are in all uppercase and have the # removed. Here is the output graph for the data above:

#tweets appeared 4 times in the list of tweets, #tweet appeared 2 times, and #new appeared once. Notice that on the y-axis of the graph each hashtag is in all caps and there is no #.

If you implement things correctly using what we have discussed you will have to exert minimal effort to generate these plots. They are the natural output of using a FreqDist object from NLTK and calling its plot function. If you are writing many lines of code to try to generate such values you are doing something wrong.

The optional argument cutoff determines how many hashtags or mentions to include. For example, if cutoff is 2, the output graph will only contain the 2 most common hashtags (don’t worry about ties, let NLTK handle that). Here is the plot using the same data with cutoff = 2:

So that you can see what this looks like when there are many hashtags, here is output from a larger data set. This is the output using cutoff = 10:

This is the output using cutoff = 5:

How to implement this function: Read this, it tells you exactly what to do

• The input to this function is a list called tweets. If you are trying to read in from a file anywhere in this function, you are doing it wrong.

• The first step in this function is to extract the hashtags from the text in tweets. Consider that tweets is a list of strings. You need to look at each item in tweets individually and use your process_hashes function to get the hashtags or mentions. Store all of these together in a new list.

• The NLTK functions/objects will automagically generate the plot you need. Make a FreqDist object from your list. Use the plot method, it will accept the cutoff argument directly. If you are not using NLTK, something is wrong.

• The function should return the FreqDist object you made. You will need to write the code to draw the plot, and then the last line of your function should return the FreqDist object.

• If you run in a notebook and don’t see a plot, make sure you have executed: %matplotlib inline, do not put that in your code that you submit – it will crash at the command line

Part 2: Main Program

Your main program will make use of the files of tweets you were given and the functions you wrote in the previous sections. Unless otherwise specified, you MUST use your functions. You should not repeat code from your functions inside of the main program.

You have also been given a helper function called join_tweets. The join_tweets function takes a dictionary as input and returns a list. The input dictionary will have lists of string as values. For example, for the input:

{'SenF': ['Bump fire stocks allow'], 'timkaine': ['Congratulations to @TeamCoachBuzz', 'Sea level threatens Hampton Roads'], 'SenShelby': ['I also stand ready to work']}

The output for join_tweets would be:

['Bump fire stocks allow', 'Congratulations to @TeamCoachBuzz', 'Sea level threatens Hampton Roads', 'I also stand ready to work']

Notice that it took the values from every dictionary item and put them all in one list to produce a list of all of the tweets in the dictionary. This function is correct. Do not change it, but you will need to use it.

Your main program should do the following using the sen_tweets_edited_2.csv file as the input file for popular_tweets. If this file crashes your computer, you may use the test file instead, just indicate that in your write-up.

• Create a dictionary that has the most popular tweets based on being retweeted at least 1000 times. Your dictionary should contain the actual text of the tweets. (Hint: use popular_tweets) Print out ONLY the usernames for this dictionary.

• Plot the 10 most common hashtags from the most popular tweets based on being retweeted at least 1000 times. You will need to process your dictionary using join_tweets first. If you are using your dictionary as input to graph_usage, you are doing it wrong.

• Create a dictionary that has most popular tweets based on have at least 500 replies. Your

dictionary should contain counts of how many tweets for each user, NOT the actual tweets.(Hint: use popular_tweets). Print out ONLY the usernames for this dictionary.

Extra credit opportunities: (Worth up to 3%)

1) Write the process_hashes function using list comprehension

2) Write a function called process_hashes_regex that uses regular expressions. Fill in your function under the def line provided.

3) Re-write the join_tweets function using list comprehension – think very carefully about how to do this. Fill in your function under the def line provided for join_tweets_lc.

SUBMISSION EXPECTATIONS

Project_3.py Your implementations/correction of the functions in Part 1. This MUST include the code given to you for join_tweets. It may also contain any extra credit you completed.

Project_3_main.py Your main program for Part 3. The first line in this file should be:

from Project_3 import *

Project_3.pdf A PDF document containing your reflections on the project including any extra credit opportunities you chose to pursue. You must also cite any sources you use. Please be aware that you can consult sources, but all code written must be your own. Programs copied in part or wholesale from the web or other sources will result in reporting of an Honor Code violation.

SUGGESTED COMPLETION SCHEDULE

This describes when to start each task to have a complete checkpoint submission.

After lecture 10/9 or 10/10: process_hashes and first task of main program

After lecture 10/16 or 10/17: popular_tweets and tasks 2 and 4 of main program

After lecture 10/18 or 10/20: graph_usage and task 3 of main program

POINT VALUES AND GRADING RUBRIC

-process_hashes (27.5 points)

-extract_tweets correction (30 pts)

-graph_usage (27.5 pts)

-Main program (12.5 pts)

Writeup – 2.5 pts

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

Financial Solutions Provider
Top Grade Essay
Assignment Hub
Math Exam Success
Solutions Store
Top Rated Expert
Writer Writer Name Offer Chat
Financial Solutions Provider

ONLINE

Financial Solutions Provider

I will be delighted to work on your project. As an experienced writer, I can provide you top quality, well researched, concise and error-free work within your provided deadline at very reasonable prices.

$17 Chat With Writer
Top Grade Essay

ONLINE

Top Grade Essay

I am an elite class writer with more than 6 years of experience as an academic writer. I will provide you the 100 percent original and plagiarism-free content.

$35 Chat With Writer
Assignment Hub

ONLINE

Assignment Hub

As an experienced writer, I have extensive experience in business writing, report writing, business profile writing, writing business reports and business plans for my clients.

$27 Chat With Writer
Math Exam Success

ONLINE

Math Exam Success

I am a PhD writer with 10 years of experience. I will be delivering high-quality, plagiarism-free work to you in the minimum amount of time. Waiting for your message.

$27 Chat With Writer
Solutions Store

ONLINE

Solutions Store

As per my knowledge I can assist you in writing a perfect Planning, Marketing Research, Business Pitches, Business Proposals, Business Feasibility Reports and Content within your given deadline and budget.

$37 Chat With Writer
Top Rated Expert

ONLINE

Top Rated Expert

I find your project quite stimulating and related to my profession. I can surely contribute you with your project.

$31 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

Measurement of electrical resistance and ohm's law lab report - Detailed class diagram example - Hpe flexfabric 20gb 2 port 650flb adapter firmware - Types of ethnic restaurants - Starbucks human resource management strategy - Homeland Security Capstone Course Part 2 - Week 7 Discussion Prompt #1 and #2 - 1984 chocolate ration quote - Mgt - How does sulfur hexafluoride change your voice - Marketing Strategies - St george tafe kogarah courses - Brize norton flight information - Fast start bonus doterra - Cocos pizza andrews farm - Chapter 5 structures of international business organizations - Winton primary school ardrossan - What holds ions together in an ionic bond - Is forum shopping ethical busi 301 - Penn foster personal narrative essay examples - BHS470 Module 1 Discussion - Discussion 150 words - Conan o brien in the year 2000 - How to design a website using notepad - Cirque du soleil harvard case study pdf - Suffolk county bus transit - Dr jekyll and mr hyde chapter 6-8 summary - MANGAERIAL ECONOMICS - Praying mantis life cycle diagram - Swimming pool register new south wales - Metal displacement reaction equation - Texting is bad for communication skills - Edward herbert building loughborough - How to calculate percentage of hearing loss from an audiogram - Headache soap note example - Harold davinier south africa - Tony hsieh leadership style and philosophy - Red bull product strategy - Amoxicillin 500mg for diverticulitis - Discussion One - The invention of wings questions - The jit lean lean pull system ensures that - 182 johnsens road fernbrook - Https owl english purdue edu media pdf 20090212013008_560 pdf - Shredding paper physical change - Fundamental concepts of organizational behavior ppt - Is supportive therapy evidence based - Swansea college of science intranet - Soler health and social - Locked bag 555 silverwater - Federal public service mobility and transport - Cybersecurity - Responding to Threats - I need 1200 words on Young woman just got chosen to save a doomed fantasy world. - Wk5 DQ - Data Analysis and Business Intelligence - Bjt amplifier problems and solutions - 1st generation of computer - HELP! - Which of the following is a disadvantage of alternative mobility paths? - Adaptation biology discussion - How to test equipotential bonding - What is the primary objective of the financial manager - Goeller scorecard - ¿qué dio origen al "lenguaje chat"? - Ucas tv personal statement - Cultural mapping - World religions vocabulary worksheet - Junot diaz drown mla citation - Prebles artforms 11th edition citation - Find the grams in 1.26 x10 4 mol of hc2h3o2 - Https www qcaa qld edu au - Megan sanders charity profits app - 2000 divided by 4 - Psychology movie review assignment - Time Value of Money - You are Vice President of Supply Chain Management in a major organization. - Suppose your bank honors a check - Ipl2 literary criticism - How to study humanities text context subtext - Policies - Behaviour guidance plan template - Mamas pawn fort payne al - Monell v new york city department of social services - Oedipus rex short version - Whoso list to hunt summary - Yo / pedirles / que / (ustedes) / darme / otra oportunidad - Benzoin ir spectrum analysis - Internal audit strategy example - Companies build associations to their brands through - Long service leave probability calculation - What are the four steps in the control process - Old spice market share 2016 - OT- Journal Analysis 3 - Springfield nor easters case study - Faculty of science helwan university - Super teacher worksheets shapes - TL2-1 - Investment in total net operating capital - To become more skilled at interpreting meanings associate with nonverbal communication, describe what your boss may communicate to you if they are reading texts and emails during an organizational meeting or in a one-on-one conversation. - Use the given minimum and maximum data entries - Sarah palin is a cunt