The Tracking the Spread dashboard is a joint project of Spotlight PA and The Philadelphia Inquirer to monitor and visualize the outbreak of COVID-19 in Pennsylvania and the surrounding region. If you’re a researcher, journalist, or just a curious reader, you might have questions about the data we’re using or how we’re presenting it. This article is for you.
Below, you’ll find information about our sourcing and methodology. If you have a question that you don’t feel has been answered or you have other feedback on how we could improve the dashboard, don’t hesitate to email either Dan Simmons-Ritchie at email@example.com or Garland Potts at firstname.lastname@example.org
Why provide a dashboard?
Spotlight PA and The Inquirer are committed to providing Pennsylvania readers the most accurate and up-to-date information on the COVID-19 outbreak. We believe our dashboard provides a simple interface for tracking the spread in Pennsylvania and surrounding states.
The state Department of Health offers its own interactive dashboard on its website that displays Pennsylvania data. However, the department’s dashboard remains cumbersome and difficult to use, particularly on mobile devices. It doesn’t provide data on states outside Pennsylvania and doesn’t prioritize certain data points that we believe are most useful for readers. So, we wanted to create a way to make it easier to understand the spread of COVID-19.
While a number of national media outlets, including the New York Times, provide impressive visualizations of the spread in Pennsylvania and nationally, we believe our dashboard better prioritizes data that is most relevant to Pennsylvanians and the decisions they face on a day-to-day basis. Our dashboard also includes certain data points that are not available on these other websites. In addition, because we primarily compile Pennsylvania data from the state Department of Health ourselves, our data is generally updated sooner than other media sources.
How do you compile your Pennsylvania data?
The dashboard provides a way to view both Pennsylvania-specific COVID-19 data and data for neighboring states.
For Pennsylvania, Spotlight PA and The Inquirer compile data each day from the state Department of Health’s interactive COVID-19 dashboard. This collection involves a combination of automated and manual processes. The data we collect from the department is not always perfect: The department does not always update its data at regular intervals and, on some occasions, it will provide inaccurate data that it later corrects.
Sometimes the data on the department’s dashboard doesn’t match the data it has released in other places, like in its press releases or on other pages of its website. We do our best to check the department’s data before publishing it to the dashboard. On some occasions, we will retroactively correct or fill in missing data based on archived information from the Department of Health website.
For hospitalization numbers for Pennsylvania, the dashboard relies on data compiled by the COVID Tracking Project. The COVID Tracking Project is a volunteer organization launched by The Atlantic that collects COVID-19 data nationally. While we could compile this data ourselves, we decided that it was easier to rely on COVID Tracking Project because they have been collecting this data since April.
Where do you get your Philadelphia data from?
The Inquirer and Spotlight PA have slightly different versions of the same core dashboard. Readers who view The Inquirer version of the dashboard can switch to Philadelphia-specific data that is unavailable on Spotlight’s version.
The data for Philadelphia is compiled in a similar way to the Pennsylvania data, through a combination of manual and automated collection each day. Similar to the state Department of Health data, Philadelphia’s data is not always perfect. The City of Philadelphia, for instance, does not update its data on weekends.
Where do you get data for the other states from?
We rely on data compiled by the New York Times and COVID Tracking Project for the other states/regions that are selectable on its dashboard. The New York Times and COVID Tracking Project are some of the few organizations that are compiling state-by-state data on COVID-19 in the U.S. Like the data that Spotlight PA and The Inquirer is compiling for Pennsylvania, this data is also not always perfect. Both organizations occasionally have to make retroactive corrections or adjustments to their data to ensure its accuracy and integrity.
Why don’t you just use federal data?
As of July 17, 2020, the federal government has not provided reliable or accurate daily data on COVID-19 cases and deaths in the U.S. Since the beginning of the outbreak in the U.S., news outlets and public health researchers have largely filled the gap: recording and compiling data themselves on a state-by-state basis. As described above, even data provided by state and county officials, as in Pennsylvania, is not always perfect.
Some websites are reporting different numbers than your numbers, why is that?
Because there is no central repository for national COVID-19 data for the U.S., there may be differences in the numbers presented on our dashboard compared to data and visualizations produced by other state or national news websites.
There are a number of reasons why this might occur. This may be because different outlets are collecting their data from different sources or on different schedules. It may be because of errors that public health officials have made in the data that were later corrected but were nevertheless recorded. It may be because certain outlets are focusing on slightly different metrics, for instance, in what is considered a “positive” coronavirus case.
And, as described above for Pennsylvania, there are occasions where the numbers in the Department of Health’s own interactive dashboard may not be the same as those reported in the department’s press materials or on other pages of its website.
We do our best to ensure we are providing readers with the best and most accurate COVID-19 available.
What do you consider a COVID-19 ‘case’?
For all states in the dashboard, “cases” represents the sum of lab-confirmed positive COVID-19 cases and “probable” COVID-19 cases. A probable case is one in which health officials or medical professionals have deemed that a person likely had COVID-19 despite not having a laboratory test and positive result. This determination is made based on criteria defined by the Centers for Disease Control and Prevention. For instance, a person may be considered a “probable” case if they had recent exposure to someone with the coronavirus, like a family member, and then shortly later they developed COVID-19 symptoms. As of July 1 in Pennsylvania, the number of probable cases was much lower than the number of confirmed cases. At that time, the state had about 85,000 confirmed cases and about 2,500 probable cases.
Are positive “antibody” test results included in your case tally?
There are two dominant types of coronavirus tests: polymerise chain reaction (PCR) tests and serological tests. PCR tests are to detect whether a person is currently infected with the coronavirus. The latter test, sometimes called an “antibody” test, is intended to detect whether a person was previously infected with the coronavirus and has since developed antibodies.
For this reason, results from PCR tests are typically used to determine the current number of people in a community who are infected with COVID-19. In Pennsylvania, as of July 1, the Department of Health said that its tally of 85,000 people who have tested positive for the coronavirus is based solely on PCR tests. The number does not include serological tests.
Of the department’s “probable” cases, the department says it does count some people who have had positive serological results. But these are people, the department says, who also have either COVID-19 symptoms or have recently been exposed to someone infected with COVID-19. For that reason, the department considers these people as “probable” COVID-19 cases. As of July 1, among the department’s tally of 2,500 probable cases, 633 people fell into this category.
How reliable are the official tallies of COVID-19 cases?
Spotlight PA and The Inquirer are gathering the best publicly available data that exists on COVID-19. At the same time, it’s well understood by public health researchers that the number of positive cases being reported by state, city, and county health departments is likely to be a significant undercount of the total number of COVID-19 infections.
There are a number of reasons for this undercount. COVID-19 is unusual in both its level of contagiousness and the fact that many people who are infected will not show symptoms. This means that the virus is able to spread rapidly through a community without detection. People may not seek a coronavirus test because they don’t know they are infected. In other cases, a person may be infected and have symptoms but choose not to get tested or, due to a lack of testing availability in their area, may not be able to get tested.
More generally, as the state Department of Health notes on its own website, there are a number of factors that can affect the number of cases reported each day. Beyond the prevalence of the virus itself, the department notes those factors include: “testing patterns (who gets tested and why), testing availability, lab analysis backlogs, lab reporting delays, new labs joining our electronic laboratory reporting system, mass screenings, etc.”
Why do you use “7-day” moving averages in some of your charts?
As described above, there are a number of factors that can influence how many new cases are reported each day by health officials. If you look closely at the charts for new daily cases and deaths, you may notice a particular pattern: Numbers tend to be higher during the middle of the week and lower during the weekends. Statistical analysts call this “seasonality.” It describes data that conforms to a pattern over regular intervals.
The reason for the seasonality in COVID-19 data is widely attributed to irregular data reporting: Officials often don’t report data over the weekend and then catch up during the working week. To provide a clearer understanding of the overall trend, we overlay some charts with a line representing the 7-day moving average of the data. A number of other news outlets, including the New York Times, have taken a similar approach to visualizing the data.
What is the ‘positivity rate’ and why is it important?
A state’s ‘positivity rate’ represents the percentage of recent COVID-19 tests that have returned positive results. In essence, as John Hopkins University of Medicine puts it, the positivity rate tells us: “how much of the disease are we finding through tests?”
The positivity rate is important for two reasons. The first is that it can tell us if a state is doing enough testing. The World Health Organization recommended in May that countries have a positivity rate less than five percent. If a state has a positivity rate above five percent that may indicate it’s only testing the sickest people and not doing enough to test people with milder cases.
But secondly, the positivity rate can help us understand if an increase in positivity COVID-19 cases is due to expanded testing or rising infections. Specifically, the university notes:
- If a rise in cases is due to increased testing, the positivity rate could look flat or like it is falling over the same time period.
- If a rise in cases is due to increased viral spread, the positivity rate could appear to be increasing over that same time period.
How do you calculate the positivity rate?
According to its website, John Hopkins says that it believes the best way to calculate the positivity rate would be as follows:
Positivity rate = 7 day moving average of people who have tested positive / 7 day moving average of all people who have been tested
However, because many states don’t directly publish data on the number of people who have tested positive, that isn’t possible in many cases. The university uses a slightly different formula in its own visualizations in order to make consistent comparisons between states. At time of writing, Sept. 2, its formula was:
Positivity rate = 7 day moving average of positive cases / (7 day moving average of positive cases + 7 day moving average of negative tests)
The key difference is that John Hopkins uses ‘positive cases’ in both its numerator and denominator rather than ‘people who have tested positive’. Although those data points sound similar, they’re slightly different: most states tally the number of ‘positive cases’ by adding together the people who have had a positive PCR test with the number of people who are considered ‘probable’ cases because of other factors. Scroll up for more information about ‘probable cases’.
In our dashboard, The Inquirer and Spotlight PA have used a very similar formula to John Hopkins. The only difference is that we have tried to exclude ‘probable cases’ in certain cases. For Pennsylvania, the positivity rate is calculated as follows:
Positivity rate = 7 day moving average of people who have tested positive by a PCR test / (7 day moving average of people who have a positive PCR test + 7 day moving average of people who have a negative PCR test)
Due to varying data available, our calculation is slightly different for the other states in our dashboard. For these states, our denominator is essentially the same as John Hopkins, meaning it may include probable cases:
Positivity rate = 7 day moving average of people who have a positive PCR test / (7 day moving average of positive cases + 7 day moving average of people who have a negative PCR test)
The result is that if you compare our positivity rates to the rates calculated by John Hopkins, you may see slight differences. We are constantly reviewing our calculation and may update our formula based on expert advice and as new data becomes available.
Why is your positivity rate for Pennsylvania different from the Pa. Department of Health?
The Pa. Department of Health provides its own weekly calculation of Pennsylvania’s positivity rate on its website. While there are slight differences between our positivity rate and John Hopkins’ positivity rate, savvy readers might notice a much larger difference between our rate for Pennsylvania and the rate calculated by the department.
According to the department’s documentation, its calculation includes people who have been retested. As of Sept 2, it noted: “Since many people who are routinely retested as part of universal testing programs repeatedly test negative, the percent positivity is lower than what would be calculated if one used the number of new cases and the number of people tested for the the first time in the most recent 7-day period”.
Please read this detailed explainer from the Inquirer to learn more about the department’s methodology.
How do you determine the “14-day trend” for counties?
For each state/region, the dashboard provides a table of county-level data including a column labelled “14-day trend.” This provides a description of the trend of new daily cases over the past 14 days as either “rising,” “falling,” or “unclear.”
To make this assessment, Spotlight PA and The Inquirer first calculate the 7-day moving average of new cases for each county for each day over the past 14 days. As described above, using 7-day moving averages is typically viewed as a more reliable way of understanding the trend given the seasonality in the data.
We then analyze this data using a statistical model called “linear regression.” Although the term may sound intimidating, the concept is relatively easy to understand when visualized: Imagine plotting a series of points across an XY-axis and then drawing a straight line to best “fit” all those points. That line is called a “linear regression line.” We calculate regression lines for each county based on their 7-day moving averages of new daily cases over the past 14 days.
We then convert the slope of those regression lines into a special number for each county that, in essence, represents the average percentage change in new daily cases for that county, per day, over the past 14 days.
Here, we made editorial judgements about what constituted a “rising” or “falling” description. If a county's percent change was greater than 2.5%, our dashboard evaluates that county as having a “rising” trend. If the county’s percentage change is lower than -2.5%, the dashboard evaluates that county as having a “falling” trend. We chose these ranges so that our evaluations of “rising” and “falling” were more likely to err on the conservative side.
We evaluate counties that have an average percentage change between 2.5 to -2.5 percent as having an “unclear” trend.
Separately, our model will evaluate a county’s trend as “unclear” if there appears to be no meaningful trend. In statistics, this determination is typically made by interpreting a special number calculated from the regression analysis called a “P value.” If a county’s regression line has a P value above 0.05, the dashboard interprets its trend as “unclear.” In statistics, it’s common practice to interpret a linear regression with a P value above 0.05 as having no meaningful trend.
We consulted Krys Johnson, an epidemiologist at Temple University, and launched the “trend description” feature of the dashboard in June.
How reliable is your “14-day trend” analysis?
While the math behind our trend analysis is complicated, it’s worth noting that, in statistical terms, it’s a relatively simplistic model. Linear regression is a common method of analysis used in statistics but, in order to chart the spread of disease, epidemiologists create far more sophisticated models that rely on multiple variables.
When reading the dashboard’s trend descriptions, care should be taken in particular when interpreting the results for sparsely populated counties. Many of these counties have relatively few new daily cases each day. In these situations, the dashboard may readily evaluate a county as having a “rising” or “falling” trend based on small movements in the data.
For these reasons, in order to understand the overall trend of their county, we urge readers to look closely at the number of new daily cases over a longer period of time, and to also consider hospitalizations and other indicators. We also strongly advise readers to follow the advice and guidance of public health officials in their communities. Our trend descriptions are intended as a helpful way of understanding the data at a glance in your county, but they are not intended to supersede or replace the judgements of local officials or public health experts.
Why don’t you include other types of data?
New types of COVID-19 data are constantly being published by state officials and public health researchers. Readers sometimes contact us about other types of COVID-19 data they’d like to see on the dashboard. We appreciate and welcome all suggestions from readers. Because of the work involved in adding and maintaining new data sources, however, we think carefully before adding new data to the dashboard. Our preference is to include data that we know comes from a reliable source, will be reliably updated each day, and is provided in a machine-readable format with a structure that is unlikely to change over weeks or months. For these reasons, we may be unable to immediately incorporate new types of data that are made available. However, we are constantly evaluating whether new sources should be included in the dashboard and we continue to appreciate reader suggestions.
As of July 13, in order to be as transparent as possible about changes we make to the dashboard, all changes will be documented here.
The date of each changes marks the date that the change was made to the Spotlight PA version of the dashboard. The Philadelphia Inquirer version of the dashboard may not necessarily be updated on the same day.
Sept 24, 2020: A new feature was added to the box of total cases, deaths, and tests at the top of the dashboard. If one of these types of data is more recent than another data type, a footnote will appear at the bottom of the box noting when each data type was last updated. For instance, if deaths data has been updated before cases and tests data, a footnote will appear with the following text: “NOTE: Deaths as of Sept 24; cases, tests as of Sept 23”. In most cases, all three types of data are updated at the same time, but this change is designed to make it clearer to readers if one data type has been more recently updated than another.
Aug 11, 2020: A bug was fixed that meant that areas of the map and county-by-county table that displayed past 14 days totals incorrectly provided totals of the past 13 days.
July 20, 2020: Two new charts were added to the ‘testing’ section of the Spotlight version of the dashboard. The dashboard now provides a chart of the daily ‘positivity rate’ for a selected state in addition to a chart that shows the average number of tests its conducted over the past seven days, adjusted for population, compared to neighboring states.
July 17, 2020: The map at the top of the dashboard is now limited to only displaying case and death numbers over the past two weeks. Prior to this, the map displayed a running total of cases and deaths for the selected state/region since the beginning of the outbreak. We made this decision because we believed it would make it easier for readers to understand current ‘hot spots’ of cases and deaths.
These changes do not apply to the ‘Philadelphia metro’ map, which is displayed on The Inquirer version of the dashboard when a user has Pennsylvania selected.
July 15, 2020: In the Testing section for all states, positive test numbers now represent only lab-confirmed positive test results. Prior to this, “positive” test numbers were derived from each state’s total tally of cases, which could include both lab-confirmed positive test results and “probable” cases. This change, coupled with changes made on July 13, ensure that the dashboard is presenting the most accurate information on each state’s number of lab-confirmed cases and its percentage of positive tests.
For all states except Pennsylvania on the dashboard, these data changes are retroactive. Because we use data compiled and archived by the COVID Tracking Project , the “positive” tally for each day in the “Total Tests” chart for these states represents only lab-confirmed positive tests.
For Pennsylvania’s testing data, however, these data changes are retroactive only to July 13. That means that in the “positive” test tallies for each day prior to July 13 in Pennsylvania’s “Total Tests” chart, some probable cases are included. Please note, as described in the change log’s July 13 note, “total test” tallies prior to July 13 also include probable tests.
On and after July 13, however, the “positive” test numbers for Pennsylvania represent only lab-confirmed positive results. And, as described in the July 13 change log note, “total tests” tallies for Pennsylvania only include lab-confirmed positive results.
July 13, 2020: Total test numbers for Pennsylvania are now calculated based on the sum of lab-confirmed positive test results and negative test results.
Prior to this, the number was derived from the sum of “cases” and negative test results. The problem with this approach is that the department’s “cases” tally includes both lab-confirmed positive results and “probable” cases. Although the number of probable cases included in the department’s “cases” tally is relatively small, this change ensures that the dashboard accurately reflects the total number of people tested in Pennsylvania.
These data changes for Pennsylvania are not retroactive. That means that in the total test tallies for each day prior to July 13 in the “Total Tests” and “Tests per Day” charts some probable cases are included. On and after July 13, the total test tallies exclude “probable” cases.