christina's bed and breakfast saco maine
You are requested to analyse students’ exam performance dataset using python programming language. The dataset contains students’ scores in three different subjects (math, reading, and writing). Wire a python program to perform the following tasks:
DS540: Advanced Python for Data Science Project proposal 1: Exam performance analysis You are requested to analyse students’ exam performance dataset using python programming language. The dataset contains students’ scores in three different subjects (math, reading, and writing). Wire a python program to perform the following tasks: Task 1: Load the dataset in relevant format and show its properties, e.g. number of records, number of features and their types. (hint: use Pandas Library to read the data). Inspect the dataset and perform data cleaning (e.g. removing duplicate records and fixing missing data). Task 2: Provide descriptive statistics of the dataset and perform an exploratory data analysis (EDA) to answer the following analysis questions: • Compare students’ exam scores in different subjects (math, reading,
writing), What trend did you find? • Who performed better in different subjects male or female students? • Show any attributes (features) that are correlated with exam scores.
(e.g. Does parental level of education affect their children exam scores? Does test preparation influence students’ performance?) (hint: use corr() method in Pandas).
(you are encouraged to impose other analysis questions based on any trend you notice in the dataset). Task 3: Show visual representation of your analysis (hint: use data visualization packages such as Matplotlib and Seaborn). Task 4: Build a machine learning model to predict student’s exam performance in each subject given the following attributes: gender, race/ethnicity, parental level of education, lunch, and test preparation course. Download the dataset from the following link: Students exam performance data
DS540: Advanced Python for Data Science Project proposal 2: Tweets sentiment analysis
You are requested to perform natural language processing on users’ tweets using python programming language. The dataset contains textual data obtain from twitter users. Wire a python program to perform the following tasks: Task 1: Load the dataset in relevant format and show its properties, e.g. number of records, number of features and their types. (hint: use Pandas Library to read the data). Inspect the dataset and perform data cleaning (e.g. removing duplicate records and fixing missing data). Task 2: Pre-process the textual data and extract features using NLP techniques as follows: • Pre-processing steps:
1. Convert to lowercase. 2. Remove stop words. 3. Normalise the text (punctuation removal, spelling correction,
Stemming). 4. Tokenisation.
• Extract the following features:
1. Compute word count per tweet.
2. Average word length per tweet.
3. Special character count per tweet. 4. Tweets sentiments. (hint: use TextBlob library to obtain tweets’
sentiments).
5. N-grams. 6. TF-IDF.
DS540: Advanced Python for Data Science Task 3: Using visual representation, show the following: most commonly used words in tweets using Worldcould, number of positive, negative, and neutral tweets and word count distribution among different sentiments). (hint: use data visualization packages such as Matplotlib and Seaborn).
Task 4: Preform sentiment analysis using machine learning techniques to classify tweets into positive, negative, or neutral sentiments given the following features: word count per tweet, average word length per tweet, and special character count per tweet.
Download the dataset from the following link: Sentiment analysis Dataset
DS540: Advanced Python for Data Science
Project guidelines The report should provide the following information:
• A written description of data with relevant spreadsheets.
• Explanation of how you analysed your data (hint: what python packages/functions did you use).
• Explanation of what data you analysed and follow with relevant
visualization.
• Show the results of your analysis, follow with relevant visualization
and highlight important results.
• Details of your machine learning model development.
Notes:
1. Follow attached report template.
2. Your report can’t go beyond 10 pages inclusive of any references.
3. You must combine yourselves into a group of 1-2 students.
4. Submission deadline is on Saturday of Week 13 (28/11/2020). 5. You must submit your Jupyter Notebook along with the report.