Order ID: 89JHGSJE83839 | Style: APA/MLA/Harvard/Chicago | Pages: 5-10 |
Instructions:
In this you will be using the churn data: churn_data.txt
Read data into a data frame using the function read.csv() with the following options:
header=T, stringsAsFactors=F
Assume that you saved the file churn_data.txt in C:/Datasets folder. Then you can read file into a data frame as follows:
file=”C:/Datasets/churn_data.txt”
churnData=read.csv(file, stringsAsFactors = FALSE,header = TRUE)
A) Print the name of the columns.
Hint: colnames() function.
B) Print the number of rows and columns
Hint: dim()
C) Count the number calls per state.
Hint: table() function.
D) Find mean, median,standard deviation, and variance of nightly charges, the column Night.Charge in the data.
The R functions to be used are mean(), median(), sd(), var().
E) Find maximum and minimum values of international charges (Intl.Charge), customer service calls (CustServ.Calls), and daily charges(Day.Charge).
F) Use summary() function to print information about the distribution of the following features:
“Eve.Charge” “Night.Mins” “Night.Calls” “Night.Charge” “Intl.Mins” “Intl.Calls”
What are the min and max values printed by the summary() function for these features?
Check textbook page 34 for a sample.
G) Use unique() function to print the distinct values of the following columns:
State, Area.Code, and Churn.
H) Extract the subset of data for the churned customers(i.e., Churn=True). How many rows are in the subset?
Hint: Use subset() function. Check lecture notes and textbook for samples.
I) Extract the subset of data for customers that made more than 3 customer service calls(CustServ.Calls). How many rows are in the subset?
J) Extract the subset of churned customers with no international plan (Int.l,Plan) and no voice mail plan (VMail.Plan). How many rows are in the subset?
K) Extract the data for customers from California (i.e., State is CA) who did not churn but made more than 2 customer service calls.
L) What is the mean of customer service calls for the customers that did not churn (i.e., Churn=False)?
question2 related to above
In this ,we will explore the churn data using graphics and visualization. One of the primary reasons for performing exploratory data analysis (EDA) is to investigate the variables, examine the distributions of the categorical variables, look at the histograms of the numeric variables, and explore the relationships among sets of variables.
Although we are not going to develop any models for this project, in a real-world project our task is to identify patterns in the data that will help to reduce the proportion of churners.
We will use the same data set we had in Week 2 assignment:
Data file: churn_data.txt
All graphics in this assignment have to be plotted using ggplot2 library. So, you need to install ggplot2 library for graphs:
install.packages(“ggplot2″)
Before using any methods from the libraries, you need to load these libraries into the R code using
library(ggplot2)
Here is how you can read data into a data frame named churnData:
churnData <- read.csv(filePath, stringsAsFactors = FALSE,header = TRUE)
where filePath is the location of the churn_data.txt file. For example, if you saved file in C:/tmp, then you should use C:/tmp/churn_data.txt
The variables in the file churn_data.txt are
State: Categorical, for the 50 states and the District of Columbia.
Account length: Integer-valued, how long account has been active.
Area code: Categorical
Phone number: Essentially a surrogate for customer ID.
International plan: Dichotomous categorical, yes or no.
Voice mail plan: Dichotomous categorical, yes or no.
Number of voice mail messages: Integer-valued.
Total day minutes: Continuous, minutes customer used service during the day.
Total day calls: Integer-valued.
Total day charge: Continuous, perhaps based on above two variables.
Total eve minutes: Continuous, minutes customer used service during the evening.
Total eve calls: Integer-valued.
Total eve charge: Continuous, perhaps based on above two variables.
Total night minutes: Continuous, minutes customer used service during the night.
Total night calls: Integer-valued.
Total night charge: Continuous, perhaps based on above two variables.
Total international minutes: Continuous, minutes customer used service to make
international calls.
Total international calls: Integer-valued.
Total international charge: Continuous, perhaps based on above two variables.
Number of calls to customer service: Integer-valued.
Churn: Target. Indicator of whether the customer has left the company (true or false).
Part 1. Bar Charts
A bar chart is a histogram for discrete data: it records the frequency of every value of a categorical variable.
1.) Vertical Bar Charts
Plot the bar charts of State, Area.Code, Int.l.Plan, VMail.Plan, CustServ.Calls, and Churn.
Use the theme() function to change the text size, location, color, etc.. (An example is given in the textbook on page 61)
The following is the bar chart for State. As an example, the x-axis labels are bold, and rotated 90 degrees which can be set in the theme() function using
axis.text.x = element_text(face=”bold”,angle=90,vjust=0.5, size=11).
Similarly, the parameter colour=”#990000″ is used for the color of the x-axis title. So, the following options for axis.title.x and axis.text.x in theme() function display the title and text of x-axis as shown in the figure below:
axis.title.x = element_text(face=”bold”, colour=”#990000″, size=12), axis.text.x = element_text(face=”bold”,angle=90,vjust=0.5, size=11)
stat_barchart
2.) Horizontal Bar Charts
Create the horizontal bar chart of CustServ.Calls.
Hint: Textbook page 49.
fall2019_int_call_horiz_bar
3.) Horizontal Bar Charts with Sorted Categories
Create horizontal bar chart where the number of calls are sorted for CustServ.Calls.
Hint: Textbook pages 50-51
fall2019_int_call_sorted_horiz_bar
Part 2: Histograms and Density Plots
The histogram and the density plot are two visualizations that help you quickly examine the distribution of a numerical variable.
A basic histogram bins a variable into fixed-width buckets and returns the number of data points that falls into each bucket. You can think of a density plot as a continuous histogram of a variable, except the area under the density plot is equal to 1.
1.) Plot the histograms of Account.Length, VMail.Message, Day.Mins, Intl.Calls, and VMail.Message.
Based on the histograms, comment on whether any of them have outliers, close to the Normal Distribution, multi-modal, or skewed.
RUBRIC |
||||||
Excellent Quality 95-100%
|
Introduction
45-41 points The background and significance of the problem and a clear statement of the research purpose is provided. The search history is mentioned. |
Literature Support 91-84 points The background and significance of the problem and a clear statement of the research purpose is provided. The search history is mentioned. |
Methodology 58-53 points Content is well-organized with headings for each slide and bulleted lists to group related material as needed. Use of font, color, graphics, effects, etc. to enhance readability and presentation content is excellent. Length requirements of 10 slides/pages or less is met. |
|||
Average Score 50-85% |
40-38 points More depth/detail for the background and significance is needed, or the research detail is not clear. No search history information is provided. |
83-76 points Review of relevant theoretical literature is evident, but there is little integration of studies into concepts related to problem. Review is partially focused and organized. Supporting and opposing research are included. Summary of information presented is included. Conclusion may not contain a biblical integration. |
52-49 points Content is somewhat organized, but no structure is apparent. The use of font, color, graphics, effects, etc. is occasionally detracting to the presentation content. Length requirements may not be met. |
|||
Poor Quality 0-45% |
37-1 points The background and/or significance are missing. No search history information is provided. |
75-1 points Review of relevant theoretical literature is evident, but there is no integration of studies into concepts related to problem. Review is partially focused and organized. Supporting and opposing research are not included in the summary of information presented. Conclusion does not contain a biblical integration. |
48-1 points There is no clear or logical organizational structure. No logical sequence is apparent. The use of font, color, graphics, effects etc. is often detracting to the presentation content. Length requirements may not be met |
|||
You Can Also Place the Order at www.collegepaper.us/orders/ordernow or www.crucialessay.com/orders/ordernow |