A scatter plot or XY plot utilizes points to represent different values for two numeric variables. The position of every dot across the horizontal and vertical axis shows values for a single data point. Scatter plots are most used to analyze relationships between different variables.
The placement or grouping of the dots will showcase different types of correlation between two variables and highlight outliers within the data. Any outliers could prompt additional investigation depending on the variables and analysis. There are numerous brands of software available with advanced functionality that offer an integrated XY Plot Template.
When to Use Scatter Plots
The primary uses of XY plots are to analyze and visualize relationships between two determined numeric variables. The dots within the resulting plot show patterns of the overall data as well as individual data points. Identifying the correlation relationships between values is highly common with scatter plots since the XY Plot Template creates a clean and easily readable result. Relationships between the numeric variables can be described as positive or negative, linear, or nonlinear, strong, or weak. Some of the common relationships are as follows:
- Strong, positive, linear – form a tight, upward-sloping relationship.
- Moderate, negative linear – form a more random downward-sloping relationship.
- Null – does not show positive or negative attributes.
- Strong, non-linear – forms a bell-curve.
Finding patterns using a scatter plot can help identify other patterns within the data. The data points can be divided into groups based on the closeness of the clusters. XY plots show if there are unexpected gaps found in the data as well as outlier points. This is incredibly useful when segmenting data into different parts.
Creating Scatter Plots
The process of creating a scatter plot is rather simple as it requires two columns within a data table, one as the X dimension and one as the Y dimension. Each row within the table acts as a single dot on the plot and the position following column values.
Scatter Plot Options
The fun of creating scatter plots to visualize data does not stop with simply having dots on a graph. There are other integrated options you can add to further help analyze the data including:
Categorical Third Variable
Within the basic scatter plot, you can add a third variable that modifies how points are plotted. The third variable indicates a categorical value with the encoding being a point color. Providing each point with a distinct hue result in the membership of every point in a group.
Highlight with Color and Annotations
If you wish to use a scatter plot to present insights, then highlighting specific points with color and annotations will make the graph easier to read. This allows you to desaturate unimportant points and the remaining points stand out while providing a reference of comparison.
Numeric Third Variable
The numeric third variable is a common encoding that results from changing the point size. XY plots with point sizes based on an additional variable are known as a bubble chart. Larger points relate to higher values which are commonly more important. Hues can also be used to portray numeric values as an alternative. Instead of using colors for points, you may want to utilize a sequence of colors such as darker for higher values.
Trend Line
Adding a trend line is one of the most common additions as it shows a correlational relationship between variables. This showcases an additional signal of the strength of the relationship between numerical values and if outlier points are impacting the trend line and computation.
The XY plot is a basic chart type that is creatable using any visualization solution. Computing basic linear trend lines are a common option along with coloring points based on levels of the variable. The scatter plot is a valuable chart type to use when investigating relationships between numeric variables within your data.