Fitting data with a polynomial
Plotly offers free, online tools for analyzing data and making graphs. In this tutorial we’ll show you how to fit polynomial curves to your data and explain what that means. Make sure to check out our other tutorials to learn how to fit your data with Gaussians, exponentials and logarithms.
Developing an intuition for the best function to fit to your data takes some practice, and Plotly is a great tool to test your guesses. Keep in mind that with $n$ data points, there is a unique $n - 1$ degree polynomial that will fit the points exactly, but this polynomial won’t tell us much about our data. In discovering trends, we need to be careful that we don’t overfit the data.
A polynomial fit is a method of modeling data with a polynomial function. Sometimes choosing the best function to fit your data requires trial and error. Scientists often investigate several mathematical models to fit their data. In this tutorial, we’re going to use data from “Mathematical modeling of the native Mexican turkey’s growth” by Pérez-Lara et al. In their work, they determine that a fourth degree polynomial model is best for estimating the growth of the native Mexican turkey.
To fit the data with a polynomial curve we choose coefficients that minimize the mean squared error, that is the average of the squares of the distances between the $y$-coordinate of each data point and the corresponding $y$-coordinate of the fitting polynomial. Plotly does this minimization for us by running a gradient descent algorithm. The closer the $R^2$ value is to 1, the better the fit.
Step 1: Make a plot
We have lots of great tutorials to help you make scatter plots, line graphs, histograms, bar charts, and more. If you need help, head to our tutorials page.
|You can import files from Google Drive, Dropbox, or Excel to create a data set. You’ll find more details in our “How to Enter Data in the grid” tutorial.
For this tutorial we’ll use data from a data set that you can find at:
|To use the data, look for the Fork and edit button, just above the data set, on the right side of your screen. Click it and a copy of the data will open in your workspace.|
|To make a scatter plot, choose Scatter plots from the MAKE A PLOT menu.
Plotly will automatically select the first column of data to be $x$ values, and the second column to be $y$ values. In our case, this is exactly what we want.
Click on the blue Scatter plot box in the sidebar to make your scatter plot. Your plot will open in a new tab.
Step 2: Polynomial regression
|To find the curve of best fit, we’ll use the FIT DATA button. You can find this button in the toolbar just above your plot. In their paper, Pérez-Lara et al. analyzed several models and determined that a fourth degree polynomial was the best choice, so we’ll use a fourth degree polynomial here.
When you click FIT DATA, you’ll see the Fitting to trace popover open. Select Add fit to this trace.
|Select Polynomial from the drop down menu.|
|You’ll be given the option to choose the degree of the polynomial–following the work of Pérez-Lara et al., we select 4th order.|
|Our Advanced tab gives you even more flexibility. You can incorporate error data into your fit, restrict the fit to a subset of your data, and change the number of points in the output fit.|
|Now click on the blue Run this fit button.
By selecting Add results as plot annotation, your graph will display the line of best fit equation and $R^2$-value.
|To close the Fitting to trace popover click the X in the upper-right corner. We can drag the annotation and even style our graph with Plotly’s online tools. You might want to check out the TRACES button.|
You can find the graph used in this tutorial, and the underlying data at: