Make a Line of Best Fit
One of the most basic but potent combos in data visualization is the scatter plot and trend line.
Whether you’re trying to find the gravitational constant or to see which professions tend to do better than average at making someone more money than his parents, Plotly is the tool for you!
This 4-step tutorial will show you how to make the graph below from a simple data table or spreadsheet.
Step 1: Enter your data
Upload a spreadsheet to the ‘Grid’ either by copy-pasting the cells you want from your spreadsheet, or by uploading that sheet using the ‘Add Data’ button. Plotly supports: CSV, Excel, Google Drive and Dropbox. For comparison, or if you would like to skip this step, you can access my data already loaded to the grid, here.
In this data from a simulated free-fall experiment, we controlled distance and measured time, but we’re actually interested in acceleration — the change in velocity over time. So we’re plotting time as ‘x’ and velocity as ‘y’. The slope of this trend line will give us the gravitational constant.
|For a scatter plot we need an independent and dependent variable — i.e. a column of x-coordinates and a column of associated y-coordinates.To select a column as your independent variable, click “choose as x” beneath the column header. To select your dependent variable, click “choose as y” beneath the column header.The buttons marking your selected variables should both now be blue. If you accidentally selected multiple columns for either of these, they may be orange or green. This is OK! Keep clicking the ‘choose’ buttons until the ones you do not want to plot are again white, and the ones you do are blue (see right)|
Step 2: Plot
|When you have two columns selected to plot in the ‘Grid’ view, select ‘Scatter plot’ from the blue dropdown at the top of the dialogue pane to the left.Now click the big blue button at the bottom. A new tab will open with our plot in it — take a look!|
|The plot should look something like this:|
Step 3: Generate a fit to your data
Okay — here’s the moment you’ve been waiting for. In the ‘Plot’ view, find the “Fit Data” button in the toolbar:
|This will open up a dialogue pane. Click Add fit to this trace:|
|Plotly is a versatile tool, so there are a number of advanced options. But our task is simple! To generate the straight line that best fits our data, we’ll:
- stay in the Basic tab,
- select the Choose a predefined fit function radio button,
- select the Linear function family from the drop down,
- and then click the Run this fit button at the bottom.
Voila — we have a line of best fit! The a value is the y-intercept of our line, and the b value is its slope.
R2 and Standard error are measures of how closely the line fits the data, and have to do with how this line was calculated. If you’d like to learn more about these metrics, Wikipedia is a good resource.
Check the “Add results as plot annotation” box to annotate the graph the line’s equation: y = 0.222 + 9.635x.
Your plot should now look something like this:
Step 4: Style
|You’re done! OR, you can experiment with the range of Plotly’s styling and format options: Traces, Layout, Axes, Notes and Legend.|
|I’ve decided to make a few tweaks. I want to include the zero values on the plot, so that the axes are visible, so I change the range of the axes in the Axes tool panel to “With Zero”.|
|I’d also like to change the colors, and the shape of the marker for each point. I make these changes in the “Style” tab of the Traces tool panel.|
|I’ve also clicked on the fit line annotation on my plot, and dragged it to a nicer position (so that the equation doesn’t overlap my data points).Here’s what the plot looks like now:|
Bonus: Fixing your coefficients
If you’re trying to just fit your line to your data set, you’re done! But we actually know something about our free fall data that isn’t captured by the data set, and has therefore not been reflected in our trend line: an object that has been falling for 0 seconds has a velocity of 0, so our line should pass through the origin, and our a value should be 0. Now we’ve got the plot at the top of this tutorial!
|If you know one or more of your coefficients, you can fix them when generating your fit line to find the line that fits the data best within those constraints. To get started, click the Fit Data button and View/edit the fit you just generated. Then select Edit fit.|
|This gets us back to the dialogue we saw in Step 3. Time to use some of those extra options!Under the “Enter fit parameter guesses” section, overwrite the value in the box labeled ‘a’ with your y-intercept (in our case 0), then click the label to fix it at that value. It’ll turn blue.
We’ll leave b as it is. Plotly overwrites unfixed coefficients when generating the new line.
Click the Run this fit button to generate the new fit (you may need to check and uncheck the annotation box to refresh the annotation).
|Our new line has the equation y=9.802x, which is 0.005 off of the true coefficient of gravity (9.807)! Not bad! To show the y-intercept on the graph, we’ll go into the Advanced tab of the fit panel.Check Plot curve over a specified x-range and set the minimum to “0”. I’m setting the maximum to the greatest time-value in our data set (2.28 s), but if you want to extrapolate your line farther, go ahead and enter a higher number.|