The Tracking the Spread dashboard is a joint project of Spotlight PA and The Philadelphia Inquirer to monitor and visualize the outbreak of COVID-19 in Pennsylvania and the surrounding region. If you’re a researcher, journalist, or just a curious reader, you might have questions about the data we’re using or how we’re presenting it. This article is for you.
Below, you’ll find information about our sourcing and methodology. Please refer to the bottom of this article for information about how the dashboard has changed over time. If you have a question that you don’t feel has been answered or you have other feedback on how we could improve the dashboard, don’t hesitate to email Carl Johnson at firstname.lastname@example.org.
Why provide a dashboard?
Spotlight PA and The Inquirer are committed to providing Pennsylvania readers the most accurate and up-to-date information on the COVID-19 outbreak. We believe our dashboard provides a simple interface for tracking the spread in Pennsylvania and surrounding states.
The state Department of Health offers its own interactive dashboard on its website that displays Pennsylvania data. However, for much of the pandemic, the department’s dashboard has been cumbersome and difficult to use, particularly on mobile devices. As of March 2, 2021, it also didn’t provide data on Pennsylvania’s neighboring states.
While a number of national media outlets, including the New York Times, provide impressive visualizations of the spread in Pennsylvania and nationally, we believe our dashboard better prioritizes data that is most relevant to Pennsylvanians and the decisions they face on a day-to-day basis. Our dashboard also includes certain data points that are not available on these other websites. In addition, because we primarily compile Pennsylvania data from the state Department of Health ourselves, our data is generally updated sooner than other media sources.
How do you compile your Pennsylvania data?
The dashboard provides a way to view both Pennsylvania-specific COVID-19 data and data for neighboring states.
For Pennsylvania, Spotlight PA and The Inquirer compile data each day from the state Department of Health’s interactive COVID-19 dashboard. This collection involves a combination of automated and manual processes. The data we collect from the department is not always perfect: The department does not always update its data at regular intervals and, on some occasions, it will provide inaccurate data that it later corrects. It’s also worth remembering that these numbers generally only indicate when an event is reported to state officials rather than when it necessarily occurred. For instance, a person may die of COVID-19 days or weeks before that death is reported to state officials and therefore included in its overall tally.
In addition, since the department first began publishing COVID-19 data on its interactive dashboard, its data has not always matched the data it has released in other places, like in its press releases or on other pages of its website. We do our best to check the department’s data before publishing it to the dashboard. On some occasions, we will retroactively correct or fill in missing data based on archived information from the Department of Health website. The Covid Tracking Project has maintained detailed notes on data issues it has encountered with Pennsylvania’s daily reported data in addition to other states.
For hospitalization numbers for Pennsylvania, the dashboard relies on data compiled by the U.S. Department of Human Services. Specifically, the data is sourced from a mirror of the department’s COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries dataset that is maintained and updated regularly by Carnegie Mellon University’s Delphi Research Group.
Where do you get your Philadelphia data from?
The Inquirer and Spotlight PA have slightly different versions of the same core dashboard. Readers who view The Inquirer version of the dashboard can switch to Philadelphia-specific data that is unavailable on Spotlight’s version.
The data for Philadelphia is compiled in a similar way to the Pennsylvania data, through a combination of manual and automated collection each day. Similar to the state Department of Health data, Philadelphia’s data is not always perfect. The City of Philadelphia, for instance, does not update its data on weekends.
Where do you get data for the other states from?
For case and death data for all states in the dashboard except Pennsylvania, we rely on county-by-county data compiled by the New York Times.
For testing data, we rely on numbers compiled by the U.S. Department of Health and Human Services. Specifically, the data is sourced from the department’s COVID-19 Diagnostic Laboratory Testing (PCR Testing) Time Series dataset
For hospitalization data, the dashboard also relies on data compiled by the U.S. Department of Human Services. Specifically, the dashboard uses a mirror of the department’s COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries dataset that is maintained by Carnegie Mellon University’s Delphi Research Group.
How have your data sources changed?
For much of 2020, federal data sources for COVID-19 data were considered incomplete and inconsistent. To provide testing and hospitalization data for all pages of the dashboard except Philadelphia and Pennsylvania, Spotlight PA and the Inquirer relied on state COVID-19 data compiled by Covid Tracking Project, a volunteer organization founded by The Atlantic.
As of March 2, 2021, however, the Spotlight version of the dashboard transitioned to using data from the U.S. Department of Health and Human Services. This was driven by the announced retirement of the Covid Tracking Project and that federal data on testing and hospitalization had improved considerably by this time. In future, more of the dashboard’s data may be sourced directly from the federal government. Please to the change log, at the bottom of this article, for more information about these changes.
Some websites display different numbers than your numbers, why is that?
Historically, since there has been no central repository for reliable COVID-19 data for the U.S., news outlets and analysts have taken divergent approaches to how they compile COVID-19 statistics, including where they source their numbers from, how they handle errors and data anomalies, and how they calculate certain metrics (like the testing ‘positivity rate’).
This means that the numbers in our dashboard may be different than what you see on other websites. In order to be as transparent as possible, we have tried to document our sourcing and methodology as thoroughly as we can and to note when we change our data sources and make other changes to our charts and maps. Please refer to the bottom of this article for more information about how the dashboard has changed over time.
There may be other reasons you see different numbers on our dashboard compared to other websites. For instance, different outlets may be updating data on different schedules from our dashboard and that may temporarily result in different numbers than our own.
In addition, public health officials may make mistakes in their data that are later corrected but are nevertheless recorded in our data or in the data of other outlets. And, as described above regarding Pennsylvania’s COVID-19 data, there are occasions where the numbers in the Department of Health’s own interactive dashboard may not be the same as those reported in the department’s press materials or on other pages of its website.
We do our best to ensure we are providing readers with the best and most accurate COVID-19 available.
What do you consider a COVID-19 “case”?
For all states in the dashboard, “cases” represents the sum of lab-confirmed positive COVID-19 cases and “probable” COVID-19 cases. A probable case is one in which health officials or medical professionals have deemed that a person likely had COVID-19 despite not having a laboratory test and positive result. This determination is made based on criteria defined by the Centers for Disease Control and Prevention. For instance, a person may be considered a “probable” case if they had recent exposure to someone with the coronavirus, like a family member, and then shortly later they developed COVID-19 symptoms. As of July 1, 2020, in Pennsylvania, the number of probable cases was much lower than the number of confirmed cases. At that time, the state had about 85,000 confirmed cases and about 2,500 probable cases.
Are positive “antibody” test results included in your case tally?
There are two dominant types of coronavirus tests: polymerise chain reaction (PCR) tests and serological tests. PCR tests are to detect whether a person is currently infected with the coronavirus. The latter test, sometimes called an “antibody” test, is intended to detect whether a person was previously infected with the coronavirus and has since developed antibodies.
For this reason, results from PCR tests are typically used to determine the current number of people in a community who are infected with COVID-19. In Pennsylvania, as of July 1, 2020, the Pa. Department of Health said that its tally of 85,000 people who have tested positive for the coronavirus is based solely on PCR tests. The number does not include serological tests.
Of the department’s “probable” cases, the department says it does count some people who have had positive serological results. But these are people, the department says, who also have either COVID-19 symptoms or have recently been exposed to someone infected with COVID-19. For that reason, the department considers these people as “probable” COVID-19 cases. As of July 1, 2020, among the department’s tally of 2,500 probable cases, 633 people fell into this category.
How reliable are the official tallies of COVID-19 cases?
Spotlight PA and The Inquirer are gathering the best publicly available data that exists on COVID-19. At the same time, for much of the pandemic, it’s been understood by public health researchers that the number of positive cases being reported by state, city, and county health departments is likely to be a significant undercount of the total number of COVID-19 infections.
There are a number of reasons for this undercount. COVID-19 is unusual in both its level of contagiousness and the fact that many people who are infected will not show symptoms. This means that the virus is able to spread rapidly through a community without detection. People may not seek a coronavirus test because they don’t know they are infected. In other cases, a person may be infected and have symptoms but choose not to get tested or, due to a lack of testing availability in their area, may not be able to get tested.
More generally, as the state Department of Health has noted on its own website, there are a number of factors that can affect the number of cases reported each day. Beyond the prevalence of the virus itself, the department noted in 2020 that those factors include: “testing patterns (who gets tested and why), testing availability, lab analysis backlogs, lab reporting delays, new labs joining our electronic laboratory reporting system, mass screenings, etc.”
Why do you use “7-day” moving averages in some of your charts?
As described above, there are a number of factors that can influence how many new cases are reported each day by health officials. If you look closely at the charts for new daily cases and deaths, you may notice a particular pattern: Numbers tend to be higher during the middle of the week and lower during the weekends. Statistical analysts call this “seasonality.” It describes data that conforms to a pattern over regular intervals.
The reason for the seasonality in COVID-19 data is widely attributed to irregular data reporting: Officials often don’t report data over the weekend and then catch up during the working week. To provide a clearer understanding of the overall trend, we overlay some charts with a line representing the 7-day moving average of the data. A number of other news outlets, including the New York Times, have taken a similar approach to visualizing the data.
What is the “positivity rate” and why is it important?
A state’s ‘positivity rate’ represents the percentage of recent COVID-19 tests that have returned positive results. In essence, as John Hopkins University of Medicine puts it: “Positivity rates are a measure of testing capacity, and can help gauge whether governments are casting a wide enough net with their testing programs to identify infections that may be occurring.”
Specifically, the university notes:
- If a rise in cases is due to increased testing, the positivity rate could look flat or like it is falling over the same time period.
- If a rise in cases is due to increased viral spread, the positivity rate could appear to be increasing over that same time period.
For this reason, we provide a calculation of the positivity rate for each region on our dashboard. Keep in mind, however, there are important caveats about these rates and direct comparisons of the positivity rates we calculate between different regions should be made with caution or, in some cases, avoided entirely.
We have provided these calculations to readers because we believe this metric still provides a useful way of understanding how COVID-19 is trending over time in conjunction with other data. At the same time, however, we urge readers and policy makers to interpret our positivity rates with care.
What are the caveats about the positivity rate?
As of March 2, 2021, John Hopkins notes there are four main ways to calculate the positivity rate. Each approach has advantages and disadvantages. These differing approaches are why the rates calculated by Spotlight PA and the Inquirer may differ from those of John Hopkins, the rates calculated by state officials, and those calculated by other news outlets.
As a general caveat, John Hopkins notes: “While this metric can provide important context about case totals and trends, it is NOT a measure of how prevalent the virus is in communities. Policy decisions, like openings and closings or interstate travel, should not be determined based on test positivity alone. Considering confirmed new cases, testing rates, and percent positivity together gives us a fuller picture of COVID-19 in a particular state or region.”
Compounding matters, as the Covid Tracking Project reported in February, 2021, there are reasons to believe that it is difficult and – in some cases, perhaps impossible – for outside analysts and news outlets to calculate accurate positivity rates due to issues with publicly available COVID-19 data. For this reason, we again urge readers, policy makers and analysts to interpret the positivity rates calculated on our dashboard with care. We will continue to review our use of positivity rates going forward.
How do you calculate the positivity rate?
As mentioned above, John Hopkins notes there are four main approaches to calculate an area’s positivity rate. The university notes that it takes the fourth approach in its own visualizations, which it describes as:
The number of people who test positive is divided by either unique people, encounters, or tests (depending on availability – each variable can help indicate the number of people tested).
In our dashboard, The Inquirer and Spotlight PA take a relatively similar approach for calculating Pennsylvania’s positivity rate. The formula can be understood as:
Positivity rate = 7 day moving average of people who have tested positive by a PCR test / (7 day moving average of people who have a positive PCR test + 7 day moving average of people who have a negative PCR test)
Note that due to slight differences in methodology, our positivity rate for Pennsylvania may differ significantly from the rate calculated by John Hopkins. In addition, due to larger methodological differences, our rate for Pennsylvania may also differ significantly from the rate calculated by the Pa. Department of Health (we provide more detail below).
The Inquirer and Spotlight PA take a different approach when calculating the positivity rate for other regions in the dashboard. Again, this is why comparisons between Pennsylvania’s positivity rate and the other states should be avoided entirely or made with caution. As of March 2, 2021, when Spotlight PA switched to using PCR specimen testing data from the U.S. Department of Health and Human Services for every state in the dashboard except Pennsylvania, it began using the third approach described by John Hopkins to calculate positivity:
Tests over Tests. The number of positive molecular test results divided by total molecular tests given.
As a formula, it can be understood as:
Positivity rate = 7 day moving average of PCR laboratory specimens that returned a positive PCR test / 7 day moving average of all PCR laboratory specimens tested
We are constantly reviewing our calculation and may update our formula based on expert advice and as new data becomes available. In addition, as described above, there are many reasons to be cautious when interpreting our positivity rates.
Why is your positivity rate for Pennsylvania different from the Pa. Department of Health?
The Pa. Department of Health provides its own weekly calculation of Pennsylvania’s positivity rate on its website. While there are slight differences between our positivity rate and John Hopkins’ positivity rate, some readers might notice a much larger difference between our rate for Pennsylvania and the rate calculated by the department.
According to the department’s documentation, its calculation includes people who have been retested. As of Sept 2, 2020, it noted: “Since many people who are routinely retested as part of universal testing programs repeatedly test negative, the percent positivity is lower than what would be calculated if one used the number of new cases and the number of people tested for the the first time in the most recent 7-day period”.
Please read this detailed explainer from the Inquirer to learn more about the department’s methodology.
How do you determine the “14-day trend” for counties?
For each state/region, the dashboard provides a table of county-level data including a column labelled “14-day trend.” This provides a description of the trend of new daily cases over the past 14 days as either “rising,” “falling,” or “unclear.”
To make this assessment, Spotlight PA and The Inquirer first calculate the 7-day moving average of new cases for each county for each day over the past 14 days. As described above, using 7-day moving averages is typically viewed as a more reliable way of understanding the trend given the seasonality in the data.
We then analyze this data using a statistical model called “linear regression.” Although the term may sound intimidating, the concept is relatively easy to understand when visualized: Imagine plotting a series of points across an XY-axis and then drawing a straight line to best “fit” all those points. That line is called a “linear regression line.” We calculate regression lines for each county based on their 7-day moving averages of new daily cases over the past 14 days.
We then convert the slope of those regression lines into a special number for each county that, in essence, represents the average percentage change in new daily cases for that county, per day, over the past 14 days.
Here, we made editorial judgements about what constituted a “rising” or “falling” description. If a county’s percent change was greater than 2.5%, our dashboard evaluates that county as having a “rising” trend. If the county’s percentage change is lower than -2.5%, the dashboard evaluates that county as having a “falling” trend. We chose these ranges so that our evaluations of “rising” and “falling” were more likely to err on the conservative side.
We evaluate counties that have an average percentage change between 2.5 to -2.5 percent as having an “unclear” trend.
Separately, our model will evaluate a county’s trend as “unclear” if there appears to be no meaningful trend. In statistics, this determination is typically made by interpreting a special number calculated from the regression analysis called a “P value.” If a county’s regression line has a P value above 0.05, the dashboard interprets its trend as “unclear.” In statistics, it’s common practice to interpret a linear regression with a P value above 0.05 as having no meaningful trend.
We consulted Krys Johnson, an epidemiologist at Temple University, and launched the “trend description” feature of the dashboard in June.
How reliable is your “14-day trend” analysis?
While the math behind our trend analysis is complicated, it’s worth noting that, in statistical terms, it’s a relatively simplistic model. Linear regression is a common method of analysis used in statistics but, in order to chart the spread of disease, epidemiologists create far more sophisticated models that rely on multiple variables.
When reading the dashboard’s trend descriptions, care should be taken in particular when interpreting the results for sparsely populated counties. Many of these counties have relatively few new daily cases each day. In these situations, the dashboard may readily evaluate a county as having a “rising” or “falling” trend based on small movements in the data.
For these reasons, in order to understand the overall trend of their county, we urge readers to look closely at the number of new daily cases over a longer period of time, and to also consider hospitalizations and other indicators. We also strongly advise readers to follow the advice and guidance of public health officials in their communities. Our trend descriptions are intended as a helpful way of understanding the data at a glance in your county, but they are not intended to supersede or replace the judgements of local officials or public health experts.
Why don’t you include other types of data?
New types of COVID-19 data are constantly being published by state officials and public health researchers. Readers sometimes contact us about other types of COVID-19 data they’d like to see on the dashboard. We appreciate and welcome all suggestions from readers. Because of the work involved in adding and maintaining new data sources, however, we think carefully before adding new data to the dashboard. Our preference is to include data that we know comes from a reliable source, will be reliably updated each day, and is provided in a machine-readable format with a structure that is unlikely to change over weeks or months. For these reasons, we may be unable to immediately incorporate new types of data that are made available. However, we are constantly evaluating whether new sources should be included in the dashboard and we continue to appreciate reader suggestions.
Where can I access raw COVID-19 data for Pennsylvania so I can conduct my own analysis?
As described above, there are many different sources of COVID-19 data. Availability and reliability of this data has evolved considerably since most U.S. news outlets began tracking COVID-19 in March, 2020.
As of March 2, 2021, the Inquirer and Spotlight PA continue to rely heavily on Pennsylvania data reported daily on the interactive dashboard of the Pa. Department of Health because it provides the freshest data possible on newly reported cases, deaths and tests. However, the department now provides access to raw, archived data on cases, deaths and testing on its Open Data portal. The daily numbers in these datasets differ from the numbers we’ve recorded because the data is better reconciled with the date an event actually occurred, like a person’s death from COVID-19, rather than when it was first reported to state officials, and data is constantly being updated and backfilled. According to a department spokeswoman, the data is also regularly cleaned to remove duplicate case reports and correct other errors. The data can also be accessed using Socrata Open Data API (SODA), which may also be helpful to researchers and developers that seek programmatic access to the data. In the future, the dashboard may rely more heavily on these data sources.
At time of writing, here are where you can find the department’s four core datasets:
- COVID-19 Aggregate Cases Current Daily County Health
- COVID-19 Aggregate Death Data Current Daily County Health
- COVID-19 PCR Test Counts
- COVID-19 Aggregate Hospitalizations Current Daily County Health
As of July 13, 2020 in order to be as transparent as possible about changes we make to the dashboard, all changes will be documented here.
The date of each changes marks the date that the change was made to the Spotlight PA version of the dashboard. The Philadelphia Inquirer version of the dashboard may not necessarily be updated on the same day.
May 21, 2021: The dashboard has been changed to use the New York Times COVID-19 data instead of data directly from the Pennsylvania Department of Health. Data from the City of Philadelphia is still used for some individual zip codes.
March 2, 2021: In preparation for the retirement of the Covid Tracking Project on March 7, data on COVID-19 testing and hospitalization sourced from the Covid Tracking Project was replaced with data sourced from the U.S. Department of Health and Human Services on the Spotlight version of the dashboard. In addition, a number of updates were made to footnotes and other text to make it easier for readers to understand how recently data in individual sections have been updated and to better explain the metrics and methodology used by the Philadelphia Inquirer and Spotlight PA. More generally, as the dashboard approaches one year of tracking the spread of COVID-19, the years ‘2020’ and ‘2021’ were added to many date references to avoid reader confusion over date ranges.
Here are more details about the transition to federal data has changed the dashboard’s data:
- For all regions in the dashboard except Pennsylvania and Philadelphia, testing data is now sourced from the U.S. Department of Health and Human Services and represents viral COVID-19 laboratory test (PCR) results representative of diagnostic specimens being tested, not individual people. Specifically, the data is sourced from the department’s COVID-19 Diagnostic Laboratory Testing (PCR Testing) Time Series dataset. As per the department’s notes, serology tests are excluded where possible. In many ways, this improves the accuracy and standardization of the testing data displayed on the dashboard. As noted by the Covid Tracking Project on Feb. 17, 2021, the quality of federal data has improved considerably since the beginning of the pandemic.
- For all region in dashboard except Philadelphia, hospitalization data is now sourced from the U.S. Department of Health and Human Services and is updated weekly rather than daily. Specifically, the data is sourced from a mirror of the department’s COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries dataset that is maintained and updated regularly by Carnegie Mellon University’s Delphi Research Group. As part of this change, data in the dashboard about the number of people on ventilators has been replaced with data on ‘adults in ICU beds’.
Feb. 28, 2021: Due to issues with an internal data pipeline, Pennsylvania’s county vaccine map was removed from the Spotlight version of the dashboard. It may be restored at a later date.
Feb. 18, 2021: A map with Pa. Department of Health data on the number of partially and fully vaccinated residents for each Pennsylvania county was added to the Pennsylvania page of the Spotlight version of the dashboard.
Feb. 4, 2021: Two footnotes were added to the New Jersey section of the dashboard explaining that New Jersey began adding positive antigen test results to its cases tallies on Jan. 4, 2021. The change in methodology caused a significant spike in cases on that day.
Feb. 2, 2021: Vaccine-related charts, created with Datawrapper and maintained by the Philadelphia Inquirer, were embedded in the Pennsylvania section of Spotlight PA’s version of the dashboard. Similar charts were embedded in the Inquirer’s version of the dashboard for the Pennsylvania and Philadelphia sections.
Sept 24, 2020: A new feature was added to the box of total cases, deaths, and tests at the top of the dashboard. If one of these types of data is more recent than another data type, a footnote will appear at the bottom of the box noting when each data type was last updated. For instance, if deaths data has been updated before cases and tests data, a footnote will appear with the following text: “NOTE: Deaths as of Sept 24; cases, tests as of Sept 23”. In most cases, all three types of data are updated at the same time, but this change is designed to make it clearer to readers if one data type has been more recently updated than another.
Aug 11, 2020: A bug was fixed that meant that areas of the map and county-by-county table that displayed past 14 days totals incorrectly provided totals of the past 13 days.
July 20, 2020: Two new charts were added to the ‘testing’ section of the Spotlight version of the dashboard. The dashboard now provides a chart of the daily ‘positivity rate’ for a selected state in addition to a chart that shows the average number of tests its conducted over the past seven days, adjusted for population, compared to neighboring states.
July 17, 2020: The map at the top of the dashboard is now limited to only displaying case and death numbers over the past two weeks. Prior to this, the map displayed a running total of cases and deaths for the selected state/region since the beginning of the outbreak. We made this decision because we believed it would make it easier for readers to understand current ‘hot spots’ of cases and deaths.
These changes do not apply to the ‘Philadelphia metro’ map, which is displayed on The Inquirer version of the dashboard when a user has Pennsylvania selected.
July 15, 2020: In the Testing section for all states, positive test numbers now represent only lab-confirmed positive test results. Prior to this, “positive” test numbers were derived from each state’s total tally of cases, which could include both lab-confirmed positive test results and “probable” cases. This change, coupled with changes made on July 13, ensure that the dashboard is presenting the most accurate information on each state’s number of lab-confirmed cases and its percentage of positive tests.
For all states except Pennsylvania on the dashboard, these data changes are retroactive. Because we use data compiled and archived by the COVID Tracking Project , the “positive” tally for each day in the “Total Tests” chart for these states represents only lab-confirmed positive tests.
For Pennsylvania’s testing data, however, these data changes are retroactive only to July 13. That means that in the “positive” test tallies for each day prior to July 13 in Pennsylvania’s “Total Tests” chart, some probable cases are included. Please note, as described in the change log’s July 13 note, “total test” tallies prior to July 13 also include probable tests.
On and after July 13, however, the “positive” test numbers for Pennsylvania represent only lab-confirmed positive results. And, as described in the July 13 change log note, “total tests” tallies for Pennsylvania only include lab-confirmed positive results.
July 13, 2020: Total test numbers for Pennsylvania are now calculated based on the sum of lab-confirmed positive test results and negative test results.
Prior to this, the number was derived from the sum of “cases” and negative test results. The problem with this approach is that the department’s “cases” tally includes both lab-confirmed positive results and “probable” cases. Although the number of probable cases included in the department’s “cases” tally is relatively small, this change ensures that the dashboard accurately reflects the total number of people tested in Pennsylvania.
These data changes for Pennsylvania are not retroactive. That means that in the total test tallies for each day prior to July 13 in the “Total Tests” and “Tests per Day” charts some probable cases are included. On and after July 13, the total test tallies exclude “probable” cases.