Data Analyst Questionnaire
Within this document are four different questions. Each question is structured in the following manner:
1) Premise - Contains any needed background information
2) Request - The actual question, what you are to solve
3) Notes - A space if you feel like including notes of any kind for the given question
Please place your answer for each question in a separate file, following this naming convention:
FINRA_Qn.docx, where n = the question number (i.e., 1, 2 ...). So the file for the first question should be named ‘FINRA_Q1.docx’.
When complete, please package everything together and send email responses to the designated POCs.
Page | 1 FINRA
Premise: You have a table named “TRADES” with the following six columns:
Column Name
Data Type
Description
Date
DATE
The calendar date on which the trade took place.
Firm
VARCHAR(255)
A symbol representing the Broker/Dealer who conducted the trade.
Symbol
VARCHAR(10)
The security traded.
Side
VARCHAR(1)
Denotes whether the trade was a buy (purchase) or a sell (sale) of a security.
Quantity
BIGINT
The number of shares involved in the trade.
Price
DECIMAL(18,8)
The dollar price per share traded.
You write a query looking for all trades in the month of August 2019. The query returns the following:
DATE
FIRM
SYMBOL
SIDE
QUANTITY
PRICE
8/5/2019
ABC
123
B
200
41
8/5/2019
CDE
456
B
601
60
8/5/2019
ABC
789
S
600
70
8/5/2019
CDE
789
S
600
70
8/5/2019
FGH
456
B
200
62
8/6/2019
3CDE
456
X
300
61
8/8/2019
ABC
123
B
300
40
8/9/2019
ABC
123
S
300
30
8/9/2019
FGH
789
B
2100
71
8/10/2019
CDE
456
S
1100
63
Questions:
1) Conduct an analysis of the data set returned by your query. Write a paragraph describing your analysis. Please also note any questions or assumptions made about this data.
The result shows that the majority of the shares were sold on the 9th and 10th of August(3200), leading to highest price traded per share. CDE firm on average, trades more than any other firm
2) Your business user asks you to show them a table output that includes an additional column categorizing the TRADES data into volume-based Tiers, with a column named ‘Tier’. Quantities between 0-250 will be considered ‘Small’, quantities greater than ‘Small’ but less than or equal to 500 will be considered ‘Medium’, quantities greater than ‘Medium’ but less than or equal to 500 will be considered ‘Large’, and quantities greater than ‘Tier 3’ will be considered ‘Very Large’ .
a. Please write the SQL query you would use to add the column to the table output.
b. Please show the exact results you expect based on your SQL query.
3) Your business user asks you to show them a table output summarizing the TRADES data (Buy and Sell) on week-by-week basis.
a. Please write the SQL query you would use to query this table.
b. Please show the exact results you expect based on your SQL query.
Notes:
1
Premise: You need to describe in writing how to accomplish a task. Your audience has never completed this task before.
Question: In a few paragraphs, please describe how to complete a task of your choice. You may choose a task of your own liking or one of the sample tasks below:
1) How to make a peanut butter and jelly sandwich
2) How to get leaves off a lawn
3) How to make a cup of tea
Notes:
Let's consider the task - How to make a cup of tea.
To accomplish this task, we first need to avail the items needed to make tea, like tea leaves, milk, sugar and some water. Now we can make tea following the below steps :
1) Put water into a kettle or saucepan.
2) Heat the water. Different types of temperature for brewing.
3) Put tea leaves into the hot water. For 1 cup tea, place 1 tablespoon loose tea leaves.
4) Steep the tea according to tea type.
5) Strain tea leaves.
6) If you want to add milk, add it after pouring the tea in cup and stir gently.
7) Add sugar to taste.
Tea is now ready.
For any task to be accomplished, we should go with step by step procedure, so that it's easy to proceed with.
2
Premise: Below is a snapshot of data from two tables: “Orders” and “Customers”, taken on 02/05/2016. You find the following documentation:
· The ORDERS table gets updated at the end of every day
· The CUSTOMERS table gets updated at the end of every week
ORDERS Table
Field Name
Description
ORDER_DT
Date the order was placed.
ORDER_ID
A unique identifier for each order.
ORDER_STATUS
The status of an order.
CUSTOMER_ID
Identifies a unique customer.
CUSTOMERS table
Field Name
Description
CUSTOMER_ID
The unique identifier of the Customer trading in the market
CUSTOMER_STATUS
The Customer's account status. It should be ‘Active’ in order to be eligible for Order processing.
CUSTOMER_FNAME
First name of a customer.
CUSTOMER_MNAME
Middle name of a customer.
CUSTOMER_LNAME
Last name of a customer.
GENDER
Gender of a customer.
AGE
Age of a customer.
Table Name: ORDERS
ORDER_DT
ORDER_ID
ORDER_STATUS
ORDER_STATUS_CD
CUSTOMER_ID
2/1/2016
1000002
Completed
S
4
2/2/2016
2000008
Processing
P
6
2/2/2016
2000009
Completed
S
7
2/2/2016
2000010
Completed
S
7
2/3/2016
3000008
Processing
P
6
2/3/2016
3000009
Cancelled
C
6
2/3/2016
3000010
Cancelled
C
4
2/3/2016
3000011
On Hold
H
3
2/3/2016
3000012
Processing
P
7
2/4/2016
4000005
Completed
S
6
(Continued on next page)
Table Name: CUSTOMERS
CUSTOMER_ID
STATUS
FNAME
MNAME
LNAME
GENDER
AGE
1
Active
John
Smith
M
70
2
Active
James
Emitt
Madison
M
68
3
Active
Joe
Anthony
Diggs
M
55
4
Inactive
Adam
Lambert
M
40
5
Active
Marcus
Dallas
M
81
6
Active
Steve
Eugene
Bullock
M
62
7
Active
Naomi
Patel
F
33
8
Active
Alexander
Pope
M
29
9
Inactive
Peter
Chandler
M
36
Any coding language can be used to query the data.
Question:
1) Your business user asks you to combine the details from these two tables in one table output, without any duplicated columns.
A. Please write the query you would use to query this (note which language you are using).
SELECT ORDERS. CUSTOMER_ID, ORDERS.ORDER_DT, ORDERS. ORDER_ID, Orders. ORDER_STATUS_CD
CUSTOMERS.STATUS, CUSTOMERS.FNAME, CUSTOMERS.MNAME, CUSTOMERS.LNAME, CUSTOMERS.GENDER, CUSTOMER.AGE
FULL OUTER JOIN ORDERS ON CUSTOMERS.CUSTOMERS_ID=ORDERSCUSTOMERS_ID;
B. Please show the exact results you expect based on your SQL query.
C. If you make assumptions to complete the task, please document them.
2) Through an investigation, your business user has learnt that there has been an order that was processed successfully by mistake.
A. Please write the query you would use to validate (or disprove) this finding (note which language you are using).
B. Please show the exact results you expect based on your SQL query.
C. If you make assumptions to complete the task, please document them.
Notes:
Premise: The following are stand-alone SQL questions. If you are unable to answer a question, please document your approach and proposed next steps. For each of the below, please show the exact results that you expect based on your SQL query.
Question:
1) Is this a valid SQL statement?
SELECT CUSTOMERS.GENDER, count(DISTINCT CUSTOMERS.CUSTOMER_ID), count(*), sum(DISTINCT CUSTOMERS.CUSTOMER_ID)
FROM CUSTOMERS
GROUP BY CUSTOMERS.GENDER;
2) Is this a valid SQL statement?
SELECT CUSTOMERS.GENDER, count(DISTINCT CUSTOMERS. CUSTOMER_ID), count(*), count(DISTINCT CUSTOMERS.AGE)
FROM CUSTOMERS
GROUP BY CUSTOMERS.GENDER;
Notes:
1. Yes it is a valid statement as we are counting the number of persons of specific gender but there is no need of summing up the customer id's and also the customer id's are unique so there is no need of applying distinct on id.
2. This is also a valid statement as we are grouping up by genders and counting the persons with different ages.