# Linear Regression – Statistics

To get the maximum number of points, you must solve both problems, provide screen captures with detailed information, and any information that demonstrates how you solved each problem PROBLEM 1 1. Description of the dataset: SAT and GPA data for 1000 students at an unnamed college. Variable name Variable meaning sex Gender of a student sat_v Verbal SAT percentile sat_m Math SAT percentile sat_sum Total of verbal and math percentile hs_gpa High school grade point average fy_gpa First-year (college) grade point average Source: Educational Testing Service originally collected the data. References: https://chance.dartmouth.edu/course/Syllabi/Princeton96/ETSValidation.html Requirements: 1. Given the above dataset, can you explain which variables (except for the fy_gpa) explain the fy_gpa variable? In other words, can you fit a regression model that explains the fy_gpa variable? 2. If you fitted a regression model, please write the linear equation of the model, and explain each coefficient. 3. Are the assumptions for the linear regression violated or not? (Show your proofs to support your statement)
4. Given the answers to the previous three requirements, do you think our model can be used to predict the first-year gpa of the new students? If yes, in what conditions? If not, why can it not be used? 5. What is the interpretation of the R-squared value? What is the interpretation of the Adjusted R-squared model? 6. Is the entire model significant? If yes, why? What is the p-value? Can you explain what it means for the entire model to be significant? 7. Please provide outputs of your regression model and all the other tests performed related to the linear regression assumptions.