Order ID:89JHGSJE83839 Style:APA/MLA/Harvard/Chicago Pages:5-10 Instructions:
Classification of information based on categorical valuables
Running head: LOGISTIC REGRESSION 1
LOGISTIC REGRESSION 2
Logistic Regression
Student Name
Institution
Course
Instructor
Date
Question (a)
Categorical variables are useful in classifying data that usually takes only one form. An example where categorical variables can be used is when classifying the ages of different individual based on the gender of the participants. The use of n-1 variable in categorical variables makes the classification easier since variables take either of the quantitative provided. In these situations, the variables are limited to take either one or zero as the quantitative value to ease the classification process (Bühlmann & Dezeure, 2016). Classification based on n-1 variable tends to be faster and also saves time and does not have many problems. When a particular variable takes 1 is assumed to be quantitative but when it takes zero the assumption made is that the variable is absent. Categorical variables involving n variables, the n-1 variables are the only important variables since they classify the data given accordingly to the required quantitative values which I either 1 or 0.
Classification of information based on categorical valuables, the n variables tend to have problems. The n value can sometimes lead to problems that may end up prolonging the classification process and also make it difficult. The n variable has problem in resulting to multi-co linearity in classifying (Guo & Berkhahn, 2016). The problem results when there is similar interconnections between the variables this create a problem in interpreting the information. The interconnection of the n variables can result in the prediction of the other variable from the other. Another problem resulting in from categorical variables is that n variable is intuitively meaning that variables can be classified based on the interests or feelings of the research. Lastly, the n variables are redundant that is do not have updated information.
Question (b)
In statistics, logistic regressions are used in classification of variable that tend to have different forms either positive or negative values. Logistic regressions classify data consisting of dependent variables with and more than two or more independent variables. The classifications are based on pacing several variables at their different level of existence (van Smeden et al., 2016). Logistic regression predict the relationship of variables that can either take 1or 0 in the classification. Logistic regressions is concerned in giving descriptions to the data and give detailed information relationship between one independent variable and more nominal independent variables. For instance, logistic regression can be used in financial institutions to clarify financial defaulters. In classification of the data, logistic regression involves use of supervised machine. The data is clarified using the sigmoid function that involves all values between zero and one in a curve to clarify the data collected.
Question (c)
The Receiver Operating Characteristics (ROC) plays a significant role in evaluating the appropriateness of the classification based on different regressions. In determining the appropriateness in the measurement performance the ROC curve uses a threshold at different settings to determine the accuracy of the measurements. The process involves a ROC curve that shows the probability of the independent and dependent variables in statistical modeling (Goksuluk et al., 2016). The ROC curve is useful in selecting the most appropriate test depending on variables and thus it is basically stuck on the variables with high true positive and low false negatives in classification .The analysis from the ROC curve shows the score for every variable rather than the binary variable in many regression models such as linear and logistics. An example of ROC curve analysis is in the medical field to classify the outcome of disease through setting a threshold variable to estimate the performance of the clarifications made by giving a score on various measures unlike in regressions that only give either the disease is present or not. To measure the appropriateness of the classification, the ROC uses the area Under the Curve (AUC) to evaluate the usefulness of the variables (Su Yuan & Zhu, 2015). The Area under receiver operating characteristics (AUROC) mainly uses trapezoidal, formulas to calculate the area where the true positive and false negative values fall.
Question (d)
The probability of an event to occur means the expectations to see a particular event appearing again in a given time. In this situation where the probability of the event occurring is 0.4 to find the odd ratio requires applying the concept that probabilities range between zero and one that would help in solving the problem. The probability of the event occurring is 0.4, therefore, solving the probability of the event not occurring, the calculation is done by finding the difference between probability range and the probability given to the occurrence of the event. Therefore, here is the solution: subtracting 0.4 from 1.
1-0.4=0.6
Therefore, 0.6 is the probability that the event will not occur. Therefore the odds ratio will be:
=, and thus the odds ratio becomes 2:3
To find the Log odds form this problem where the probability of an event occurring is 0.4 requires finding the logarithm of the odd ratio thus the log odds are
Log 0.4/0.6) =0.176091259
References
Buhrmann, P., & Dezeure, R. (2016). Discussion on ‘regularized regression for categorical data (Tutz and Gertheiss)’. Statistical Modelling, 16(3), 205-211.
Goksuluk, D., Korkmaz, S., Zararsiz, G., & Karaagaoglu, A. E. (2016). easyROC: an interactive web-tool for ROC curve analysis using R language environment. R J, 8(2), 213-230.
Guo, C., & Berkhahn, F. (2016). Entity embeddings of categorical variables. arXiv preprint arXiv:1604.06737.
Su, W., Yuan, Y., & Zhu, M. (2015, September). A relationship between the average precision and the area under the ROC curve. In Proceedings of the 2015 International Conference on The Theory of Information Retrieval (pp. 349-352). ACM.
van Smeden, M., de Groot, J. A., Moons, K. G., Collins, G. S., Altman, D. G., Eijkemans, M. J., & Reitsma, J. B. (2016). No rationale for 1 variable per 10 events criterion for binary logistic regression analysis. BMC medical research methodology, 16(1), 163.
RUBRIC
Excellent Quality
95-100%
Introduction 45-41 points
The background and significance of the problem and a clear statement of the research purpose is provided. The search history is mentioned.
Literature Support
91-84 points
The background and significance of the problem and a clear statement of the research purpose is provided. The search history is mentioned.
Methodology
58-53 points
Content is well-organized with headings for each slide and bulleted lists to group related material as needed. Use of font, color, graphics, effects, etc. to enhance readability and presentation content is excellent. Length requirements of 10 slides/pages or less is met.
Average Score
50-85%
40-38 points
More depth/detail for the background and significance is needed, or the research detail is not clear. No search history information is provided.
83-76 points
Review of relevant theoretical literature is evident, but there is little integration of studies into concepts related to problem. Review is partially focused and organized. Supporting and opposing research are included. Summary of information presented is included. Conclusion may not contain a biblical integration.
52-49 points
Content is somewhat organized, but no structure is apparent. The use of font, color, graphics, effects, etc. is occasionally detracting to the presentation content. Length requirements may not be met.
Poor Quality
0-45%
37-1 points
The background and/or significance are missing. No search history information is provided.
75-1 points
Review of relevant theoretical literature is evident, but there is no integration of studies into concepts related to problem. Review is partially focused and organized. Supporting and opposing research are not included in the summary of information presented. Conclusion does not contain a biblical integration.
48-1 points
There is no clear or logical organizational structure. No logical sequence is apparent. The use of font, color, graphics, effects etc. is often detracting to the presentation content. Length requirements may not be met
You Can Also Place the Order at www.collegepaper.us/orders/ordernow or www.crucialessay.com/orders/ordernow Analyze the Water Footprint Results