Bivariate relationship linearity, strength and direction (video) | Khan Academy
In this equation, “x” and “y” are two variables which are related by the Mathematically similar to a linear relationship is the concept of a linear. The numerical measure that assesses the strength of a linear relationship is called the . whereas in reality, there is no linear relationship between X and Y. This figure shows a scatter plot for two variables that have a strongly positive linear relationship between them. The correlation between X and Y equals
So, this data right over here, it looks like I could get a, I could put a line through it that gets pretty close through the data. You're not gonna, it's very unlikely you're gonna be able to go through all of the data points, but you can try to get a line, and I'm just doing this.
There's more numerical, more precise ways of doing this, but I'm just eyeballing it right over here. And it looks like I could plot a line that looks something like that, that goes roughly through the data. So this looks pretty linear. And so I would call this a linear relationship. And since, as we increase one variable, it looks like the other variable decreases. This is a downward-sloping line. I would say this is a negative. This is a negative linear relationship.
But this one looks pretty strong. So, because the dots aren't that far from my line. This one gets a little bit further, but it's not, there's not some dots way out there. And so, most of 'em are pretty close to the line.
So I would call this a negative, reasonably strong linear relationship. Negative, strong, I'll call it reasonably, I'll just say strong, but reasonably strong, linear, linear relationship between these two variables. Now, let's look at this one. And pause this video and think about what this one would be for you. I'll get my ruler tool out again. And it looks like I can try to put a line, it looks like, generally speaking, as one variable increases, the other variable increases as well, so something like this goes through the data and approximates the direction.
And this looks positive. As one variable increases, the other variable increases, roughly. So this is a positive relationship. But this is weak.
Bivariate relationship linearity, strength and direction
A lot of the data is off, well off of the line. But I'd say this is still linear. It seems that, as we increase one, the other one increases at roughly the same rate, although these data points are all over the place.
So, I would still call this linear. Now, there's also this notion of outliers. If I said, hey, this line is trying to describe the data, well, we have some data that is fairly off the line. So, for example, even though we're saying it's a positive, weak, linear relationship, this one over here is reasonably high on the vertical variable, but it's low on the horizontal variable.
And so, this one right over here is an outlier. It's quite far away from the line. You could view that as an outlier. And this is a little bit subjective.
Outliers, well, what looks pretty far from the rest of the data? This could also be an outlier. Let me label these. Now, pause the video and see if you can think about this one. Is this positive or negative, is it linear, non-linear, is it strong or weak?
I'll get my ruler tool out here.
So, this goes here. It seems like I can fit a line pretty well to this.
So, I could fit, maybe I'll do the line in purple. I could fit a line that looks like that. And so, this one looks like it's positive. As one variable increases, the other one does, for these data points.
So it's a positive. I'd say this was pretty strong. The dots are pretty close to the line there. It really does look like a little bit of a fat line, if you just look at the dots.
So, positive, strong, linear, linear relationship. And none of these data points are really strong outliers. This one's a little bit further out. But they're all pretty close to the line, and seem to describe that trend roughly.
One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable. For example, a modeler might want to relate the weights of individuals to their heights using a linear regression model. Before attempting to fit a linear model to observed data, a modeler should first determine whether or not there is a relationship between the variables of interest. This does not necessarily imply that one variable causes the other for example, higher SAT scores do not cause higher college gradesbut that there is some significant association between the two variables.
A scatterplot can be a helpful tool in determining the strength of the relationship between two variables. If there appears to be no association between the proposed explanatory and dependent variables i.
A valuable numerical measure of association between two variables is the correlation coefficientwhich is a value between -1 and 1 indicating the strength of the association of the observed data for the two variables. Least-Squares Regression The most common method for fitting a regression line is the method of least-squares. This method calculates the best-fitting line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line if a point lies on the fitted line exactly, then its vertical deviation is 0.
Because the deviations are first squared, then summed, there are no cancellations between positive and negative values. Example The dataset "Televisions, Physicians, and Life Expectancy" contains, among other variables, the number of people per television set and the number of people per physician for 40 countries. Since both variables probably reflect the level of wealth in each country, it is reasonable to assume that there is some positive association between them.
After removing 8 countries with missing values from the dataset, the remaining 32 countries have a correlation coefficient of 0. Suppose we choose to consider number of people per television set as the explanatory variable, and number of people per physician as the dependent variable. The regression equation is People. To view the fit of the model to the observed data, one may plot the computed regression line over the actual data points to evaluate the results.
For this example, the plot appears to the right, with number of individuals per television set the explanatory variable on the x-axis and number of individuals per physician the dependent variable on the y-axis. While most of the data points are clustered towards the lower left corner of the plot indicating relatively few individuals per television set and per physicianthere are a few points which lie far away from the main cluster of the data. These points are known as outliers, and depending on their location may have a major impact on the regression line see below.