Impact Of Merging the Domains into A Single Cluster

July 15, 2022

Impact Of Merging the Domains into A Single Cluster

Order ID:89JHGSJE83839 Style:APA/MLA/Harvard/Chicago Pages:5-10

Instructions:

Impact Of Merging the Domains into A Single Cluster

Given the following set of retrieved documents with relevance judgments.

Calculate a new query using a factor of 1/2 for positive feedback and 1/4 for negative feedback

Determine which documents would be retrieved by the original and by the new query

Discuss the differences in documents retrieved by the original versus the new query.

Given the following documents, determine which documents will be returned by the query ( and ).

How would you define an item on the Internet with respect to a search statement and similarity function?

If clustering has been completed on two different domains. Discuss the impact of merging the domains into a single cluster for both term clustering and item clustering. What factors will affect the amount of work that will be required to merge the clusters together? (HINT: consider the steps in clustering)

Which of the guidelines and additional decisions can be incorporated in an automatic statistical thesaurus construction program? Describe how they would be implemented and the risks with their implementation. Describe your justification for the guidelines and exercises selected that cannot be automated.

Prove that a term could not be found in multiple clusters when using the single link technique.

Describe what effect increasing and decreasing the threshold value has on the creation of classes and under what condition you would make the change.

Given the following Term-Term matrix:

Determine the Term Relationship matrix using a threshold of 10 or higher

Determine the clusters using the clique technique

Determine the clusters using the single link technique

Determine the clusters using the star technique where the term selected for the new seed for the next star is the smallest number term nor already part of a class.

Discuss the differences between the single link, the clique and the star clusters.

What are the characteristics of the items that would suggest which technique to use?

a. Using the document/document relationship matrix for similarity values and the starting point of the following 4 cluster, combine the clusters until you get to 2 clusters using the four HACM techniques described in the book (Single link, complete link, average link (not Wards method) and group average link)

CL1 = D1, D3, D4

CL2 = D2, D6, D8

CL3 = D5, D7, D12

CL4 = D9, D10, D11

Trade off the use of Single link, complete link, Wards method and group average link techniques in creating a hierarchical cluster set. Which one gives the best and which one gives the worst clusters and why

Will the clustering process always come to the same final set of clusters no matter what the starting clusters? Explain your answer.

Can statistical thesaurus generation be used to develop a hierarchical cluster representation of a set of items? Discuss the value of creating the hierarchy and how you would use it in a system.

What is the effect of clustering techniques on reducing the user overhead of finding relevant items.

What are the technical issues with providing clustering presentation with every search? Is there some preprocessing approach that could make such a presentation more realistic?

In order to do timeline presentation what information is needed? How would that information be determined?

Use the autosummarize capability in MS Word against a textual item you have. How well did it work. Search the Internet for other sites that allow you to submit text to be summarized and look at those results. What seems to work and what are the inherent limits in text summarization.

Expand the discussion on text summarization to when you are summarizing across multiple items. What functions and capabilities are essential in the display of the summarized information to assist the user in validating the results and feeling confident about its completeness?

What are the basic limitations and difficulties in a user generating a search and getting results back from an image or video image search? What unique functions need to be provided to allow the user to validate the results of their search (map their search to each result returned) and to enhance the search to make it more precise?

What are the problems associated with generalizing the results from controlled tests on information systems to their applicability to operational systems? Does this invalidate the utility of the controlled tests?

What are the main issues associated with the definition of relevance? How would you overcome these issues in a controlled test environment?

Consider the following table of relevant items in ranked order from four algorithms along with the actual relevance of each item. Assume all algorithms have highest to lowest relevance is from left to right (Document 1 to last item). A value of zero implies the document was non-relevant).

Calculate and graph precision/recall for all the algorithms on one graph.

Calculate and graph fallout/recall for all the algorithms on one graph

Calculate the MAP value for each algorithm

Calculate the Bpref at 20 items.

Calculate the DCG at 10 items.

What is the F-measure at item 20.

What is the relationship between precision and TURR.

Discuss the sources of potential errors in the final set of search terms from when a user first identifies a need for information to the creation of the final query.

Why does the numerator remain basically the same in all of the similarity measures? Discuss other possible approaches and their impact on the formulas.

Impact Of Merging the Domains into A Single Cluster

RUBRIC

Excellent Quality

95-100%

Introduction

45-41 points

The background and significance of the problem and a clear statement of the research purpose is provided. The search history is mentioned.

Literature Support

91-84 points

The background and significance of the problem and a clear statement of the research purpose is provided. The search history is mentioned.

Methodology

58-53 points

Content is well-organized with headings for each slide and bulleted lists to group related material as needed. Use of font, color, graphics, effects, etc. to enhance readability and presentation content is excellent. Length requirements of 10 slides/pages or less is met.

Average Score

50-85%

40-38 points

More depth/detail for the background and significance is needed, or the research detail is not clear. No search history information is provided.

83-76 points

Review of relevant theoretical literature is evident, but there is little integration of studies into concepts related to problem. Review is partially focused and organized. Supporting and opposing research are included. Summary of information presented is included. Conclusion may not contain a biblical integration.

52-49 points

Content is somewhat organized, but no structure is apparent. The use of font, color, graphics, effects, etc. is occasionally detracting to the presentation content. Length requirements may not be met.

Poor Quality

0-45%

37-1 points

The background and/or significance are missing. No search history information is provided.

75-1 points

Review of relevant theoretical literature is evident, but there is no integration of studies into concepts related to problem. Review is partially focused and organized. Supporting and opposing research are not included in the summary of information presented. Conclusion does not contain a biblical integration.

48-1 points

There is no clear or logical organizational structure. No logical sequence is apparent. The use of font, color, graphics, effects etc. is often detracting to the presentation content. Length requirements may not be met

You Can Also Place the Order at www.collegepaper.us/orders/ordernow or www.crucialessay.com/orders/ordernow