computer science project and need a sample draft to help me learn.
limited Attempts Allowed
Data Analysis(50 points)
Goals: This exercise is designed to walk you through the process of building a data analysis model in RapidMiner. In this exercise, we will be performing Textual Associations. This type of analysis looks for words or groups of words that appear together in the data to uncover hidden or important relationships.
About the data: I’ve downloaded an employee reviews dataset from Kaggle a while ago (About the dataLinks to an external site.) and now I don’t have the link to the source. And I’ve broken down the file into smaller files. You must choose one data file from the below files before moving further.
The learning out of this assignment is to pipe that data through an analysis diagram and uncover textual associations common to all of the abstracts. Your final diagram should look something like this:
The basis for this analysis is the Market Basket data mining approach. The idea behind a Market Basket approach is to imagine a shopper with a shopping basket that they are putting products in. After they checkout at the register, the store might want to know the groups of things that are bought together so that they can learn new consumer behaviors and market highly associated products together. Text mining is able to leverage this approach to show important co-occurrences in terms and associate words into groups, or “rules”.
RapidMiner does this by first processing text and then using an operator to find a common grouping of terms (co-occurrences) and an operator to create the association rules of the term groupings.
Task I – Install the Textual Processing extension (0 points)
Open RapidMiner and start with a New Process/Blank process
Go up to the top, click the Extensions dropdown and select Marketplace. Then click the tab for Top Downloads, locate Text Processing and install it. You will most likely have to close and reopen RapidMiner after the install
Task II – Data file preparation (15 points)
Download any one data file.
Open RapidMiner and start with a New Process/Blank process
On the Repository panel select Add Data and choose the data file from the location on your computer. If your interface panels are messed up, you can reset them by clicking on View->Restore Default View.
Select the defaults in the import wizard and choose a location to store the imported data. Give it the name EmployeeReviews or Assign4Data and then click Finish.
In the Repository, panel right-click the dataset and select Edit.
In the Data Editor panel that appears, right-click the ‘cons’ column (or any other text column) and select Modify Attribute
Then change the Attribute Type to text and click Apply
In the Data Editor panel make sure and save your changes and then close the Data Editor panel
Task III – Textual Processing (20 points)
Drag dataset from the Repository panel to the Process canvas.
In the Operators panel, type in “Select” and drag the Select Attributes operator onto the Process canvas
In the Operators panel, type in “Process Documents” and drag the Process Documents from Data operator onto the Process canvas
Click on the Select Attributes operator. Then in the Parameters panel, select attribute filter type: single and choose “cons” for the attribute
Click on the Process Documents from Data operator. Then in the Parameters panel, select the checkbox for creating word vector, choose Binary Term Occurrences for vector creation, select the checkbox for keep text, choose absolute for prune method and then 2 for prune below and 9999 for prune above.
Next, double-click the Process Documents from Data operator. Inside this operator, you are going to add a number of textual processing operators. First, in the Operators panel, type in “Tokenize” and drag the Tokenize operator onto the Process canvas. Highlight this operator and in the Parameters panel, select non-letters for the mode
Next, in the Operators panel, type in “Transform” and drag the Transform Cases operator onto the Process canvas. Highlight this operator and in the Parameters panel, select lower case for transform to
In the Operators panel, type in “Filter Stop” and drag the Filter Stopwords (English) operator onto the Process canvas.
Finally, in the Operators panel, type in “Filter Tokens” and drag the Filter Tokens (by Length) operator onto the Process canvas. Highlight this operator and in the Parameters panel, select 4 for min chars and 99 for max chars
Go back to the main Process canvas by clicking the breadcrumb link under the title of the Process panel.
Next, in the Operators panel, type in “Numerical” and drag the Numerical to Binomial operator onto the Process canvas. Connect the exa port from the Process Documents from Data to the exa port on this operator
Part IV – Textual Association (15 points)
Next, in the Operators panel, type in “FP” and drag the FP-Growth operator onto the Process canvas. In the Parameters panel, select 0.05 for the min support attribute. Max items can be set to 0 for no limit.
In the Operators panel, type in “Create Association” and drag the Create Association Rules operator onto the Process canvas.
Connect the fre port from the FP-Growth operator to the ite port on Create Association Rules operator. Connect the rul port on the Create Association Rules operator to the res port on the Process canvas.
Click on the Create Association Rules operator. In the Parameters panel, select 0.2 for the min confidence attribute
Now you are finally ready to run the analysis. Click the play button on the top bar above the Process panel, to run it.Your main view will be flipped from the Design view to the Results view. This toggle is available on the top bar when you want to switch back and forth. In the results view you can see the Association Rules created from the abstract text. You may need to drag the columns to make them narrower so that you can see the important statistical measurements: Support, Confidence and Lift.
Support The proportion of documents that contain the item set. From the results, you can see that nearly 10% of all documents contain an association of the terms Information and Users
Confidence The proportion of documents that include the conclusion term given the premise term
Lift Measures how many times Confidence is larger than the expected (baseline) Confidence. A lift value that is greater than 1 is desirable
On the left-hand side of the Results View there are different ways to view the results. For the association rules, click Description.
You will note that [life, balance] –> [work] (confidence: 0.992). This is interpreted that in the selected ‘cons’ column the words life and balance are a premise to the word work with a confidence of 99.2%. In the selected column the words life and balance were strongly associated with the word work. With this, we can conclude that employees were not happy with their work-life balance while working at Microsoft. Make sure to also examine the graphing results. By increasing the minimum criterion slider, you can change the confidence (or Support/Lift) to examine the association rules visually.
Report (50 points)
You need to present all your findings in the report. Please look at the below instructions carefully to prepare your data analysis report.
Instructions for the report:
Title of the report: It should be a short version of what your analysis is about.
Use APA 7th Edition for citations and references and any points not addressed in the instructions.
For all the text, use 12-point Times New Roman
Use the following sub-headings in the report and use 14-point Times New Roman
About the dataset
Process Documents from Data
Filter Stop words
Create Association RulesMake sure to explain the following operators in the tools and techniques section of the report (definitions and how/why it was used in your data analysis task):
Results of the analysis
Make sure to explain the association rule graph (Description or the graph) in the analysis section of the report.
The text should be single-spaced within paragraphs and double-spaced between paragraphs
Number all the images and tables
Spell-check the document
At the top right corner on page 1, type three lines: your full name, course number, and semester and year like the below mentioned:
Create a one-line header at the top right, from the beginning on page 2, with your last name and page number like this:
The final resulting document should be a single file in PDF or MS Word.
Attach your complete Excel Workbook.
Use the following format to save or name your files
[lastname][assignmentname] like this: RhodesAssignment1.docx/pdf/xlsx
SUBMIT THE REPORT AND THE RAPIDMIENR FILE.
Choose a submission type
Select submission type TextText
Select submission type Web URLWeb URL
Select submission type MediaMedia
Select submission type UploadUpload
Select submission type StudioStudio
More submission optionsMore
I AGREE TO THE TOOL’S END-USER LICENSE AGREEMENTTHIS ASSIGNMENT SUBMISSION IS MY OWN, ORIGINAL WORK
Requirements: 2 hour