BDAT 1008 Spark Schemas Worksheet Essay Assignment
Order ID:89JHGSJE83839 Style:APA/MLA/Harvard/Chicago Pages:5-10 Instructions:
BDAT 1008 Spark Schemas Worksheet Essay Assignment
Assignment
– Loading Data with Schema INSTRUCTIONS
In the lecture on Spark Structured API, we did not specify the schema of our dataset. We relied on the inference of Spark engine which may not always be accurate. We can create a schema by using an object of a class called StructType consisting of an array of StructFields.
More details on Spark Schemas can be found at this link https://sparkbyexamples.com/spark/spark-schema-exp… The code to load the youtube dataset used in the lectures with a schema has been provided as a guide. Once you are familiar with how to create schemas, load the stocks dataset into Spark.
You can get the stocks dataset by running the following wget command: wget https://www.dropbox.com/s/ia779cdcjfctd84/stocks
Note that dates in Spark are only recognized if they have a special format. You can treat dates as strings for simplicity. Once you have loaded the stocks datasets with the correct schema in Spark, answer ONE of the following query questions:
- Find the top 5 stocks with the maximum average trading volume
- Find the top 5 stocks with the maximum closing price
- Find the top 5 stocks with the highest price change during any trading day
DELIVERABLES Submit your code (creating the schema, loading of data as a DataFrame, and the corresponding query) as text file. Along with your code, in a separate file, submit the screenshots of your code being executed.
RUBRIC
Excellent Quality
95-100%
Introduction 45-41 points
The background and significance of the problem and a clear statement of the research purpose is provided. The search history is mentioned.
Literature Support
91-84 points
The background and significance of the problem and a clear statement of the research purpose is provided. The search history is mentioned.
Methodology
58-53 points
Content is well-organized with headings for each slide and bulleted lists to group related material as needed. Use of font, color, graphics, effects, etc. to enhance readability and presentation content is excellent. Length requirements of 10 slides/pages or less is met.
Average Score
50-85%
40-38 points
More depth/detail for the background and significance is needed, or the research detail is not clear. No search history information is provided.
83-76 points
Review of relevant theoretical literature is evident, but there is little integration of studies into concepts related to problem. Review is partially focused and organized. Supporting and opposing research are included. Summary of information presented is included. Conclusion may not contain a biblical integration.
52-49 points
Content is somewhat organized, but no structure is apparent. The use of font, color, graphics, effects, etc. is occasionally detracting to the presentation content. Length requirements may not be met.
Poor Quality
0-45%
37-1 points
The background and/or significance are missing. No search history information is provided.
75-1 points
Review of relevant theoretical literature is evident, but there is no integration of studies into concepts related to problem. Review is partially focused and organized. Supporting and opposing research are not included in the summary of information presented. Conclusion does not contain a biblical integration.
48-1 points
There is no clear or logical organizational structure. No logical sequence is apparent. The use of font, color, graphics, effects etc. is often detracting to the presentation content. Length requirements may not be met
You Can Also Place the Order at www.collegepaper.us/orders/ordernow or www.crucialessay.com/orders/ordernow BDAT 1008 Spark Schemas Worksheet Essay Assignment
BDAT 1008 Spark Schemas Worksheet Essay Assignment