The Project Tycho® database aims are to advance the availability and use of public health data for science and policy… by acquisition of new data, by building infrastructure for data standardization, integration, quality control, and data redistribution, by developing innovative analytics, and by advocacy.
[Project Tycho® database was named after] the Danish nobleman Tycho Brahe (1546—1601), who is known for his detailed astronomical and planetary observations. Tycho was not able to use all of his data for breakthrough discoveries, but his assistant Johannes Kepler (1571-1630) used Tycho’s data to derive the laws of planetary motion. Similarly, this project aims to advance the availablity of large-scale public health data to the worldwide community to accelerate advancements in scientific discovery and technological progress.
Currently, we have completed digitization of the entire history of weekly National Notifiable Disease Surveillance System (NNDSS) reports for the United States (1888-2013) into a database in computable format (Level 3 data). We have standardized a major part of these data for online access (Level 2 data). A subset of the U.S. data was cleaned further and used for a study on the impact of vaccination programs in the United States that was recently published in the NEJM (Level 1 data).
The Project Tycho® data are organized as counts. A count is defined as the number of cases or deaths due to a disease in a specific location and time period. A count is equivalent to a data point. During the 126 year period of weekly disease reporting, the types of reports have been changed regularly, leading to different types of data counts across time. This makes the integration and standardization of these data a complex task. Currently, available data are categorized in three levels based on the type of counts included. Level 1 includes different types of counts that have been standardized into a common format for a specific analysis published recently in the NEJM. Level 2data only includes counts that have been reported in a common format, e.g. diseases reported for a one week period and without disease subcategories. These data can be used immediately for analysis, includes a wide range of diseases and locations but this level does not include data that have not been standardized yet. Level 3 data include all the different types of counts ever reported. Although this is the most complete data, the large number of different counts requires extensive standardization and various judgment calls before they can be used for analysis.
Project Tycho® level 1 data include data counts that have been standardized for specific analyses. This involves standardization of various types of data counts into a common format and exclusion of data counts that are not required for a specific purpose. In addition, external data such as population data may have been integrated with diseases data to derive rates or for other applications. Data availability is also dependent on historical reporting priorities and not all diseases were reported by all locations every year.
The current version of level 1 data includes counts at the state level for smallpox, polio, measles, mumps, rubella, hepatitis A, and whooping cough and at the city level for diphtheria. The time period of data varies per disease somewhere between 1916 and 2010. This version includes cases as well as incidence rates per 100,000 population based on historical population estimates. These data have been used by investigators at the University of Pittsburgh to estimate the impact of vaccination programs in the United States, recently published in the New England Journal of Medicine.”
Description source: Willem G. van Panhuis, John Grefenstette, Su Yon Jung, Nian Shong Chok, Anne Cross, Heather Eng, Bruce Y Lee, Vladimir Zadorozhny, Shawn Brown, Derek Cummings, Donald S. Burke. Contagious Diseases in the United States from 1888 to the present. NEJM 2013; 369(22): 2152-2158.