Data Extraction Tools and Methodology
(Back to the Crossroads Search main page)
American FactFinder (U.S. Bureau of the Census)
American FactFinder provides access to data about the United States, Puerto Rico and the Island Areas. The data in American FactFinder come from several censuses and surveys. They include the American Community Survey, the American Housing Survey, Annual Economic Surveys, Annual Surveys of Governments, the Census of Governments, the Commodity Flow Survey, the Decennial Census, the Economic Census, the Economic Census of the Island Areas, the Survey of Business Owners, the Equal Employment Opportunity (EEO) Tabulation, the Population Estimates Program, and the Puerto Rico Community Survey.
American National Standards Institute (ANSI) codes (U.S. Bureau of the Census)
"American National Standards Institute codes (ANSI codes) are a standardized set of numeric or alphabetic codes issued by the American National Standards Institute (ANSI) to ensure uniform identification of geographic entities through all federal government agencies. These standards replace the Federal Information Processing Standards (FIPS) codes previously issued by the National Institute of Standards and Technology (NIST)." The Census Bureau, as a major user of FIPS codes, provides this page with links to ANSI codes publications.
ArcGIS Explorer: Free GIS Data Viewer (esri (formerly Environmental Systems Research Institute))
From the web site: "ArcGIS Explorer is a free, downloadable GIS viewer that gives you an easy way to explore, visualize, and share GIS information." In addition to the free ArcGIS Explorer software, this site also offers free newsletters and networking with other users, plus information on esri's commercial products and services.
Area Health Resource File (AHRF) (Quality Resource Systems, Inc.)
"The Area Health Resource File (AHRF) is a database containing over 6,000 variables for each county in the US. AHRF formerly known as Area Resource File (ARF) is used for health service research, health policy analysis, and other geographically based activities." The ARF data from 1940-1990, as well as the 1999, 2005 and 2009-2010 releases, are available at DISC for UW-Madison campus users. The ARF website provides a search engine to identify which variables are available in the most recent annual release.
Argentina National Institute of Statistics and Censuses (INDEC) (Argentina National Institute of Statistics and Censuses)
Argentina National Institute of Statistics and Censuses (INDEC, Spanish acronym) "is the technical government agency responsible for the coordination and supervision of all public statistical activities taking place in the Argentine territory." INDEC's statistics include Population, Living Conditions, Employment and Income, Education, Health, Tourism and Culture, Price Indexes, Agricultural Sector, Mining and Energy, Industry and Construction, Trade and Services, Businesses, National Accounts, International Accounts, Foreign Trade, and New Information and Communication Technologies (ICTs).
Basic Tables: 1990 Demographic Profile Generator (University of Missouri-St. Louis, Urban Information Center)
"This application generates a single 1990 'Basic Tables' (demographic profile) report for any of the supported geographic units, including census tract, block group, city (no size limit), 5-digit ZIP code, state, county or metro area for anywhere in the United States. Enter only codes relevant to the area for which you want data." Although this is a terrific resource, it is not necessarily easy to use -- primarily because the selection is geographic code-based rather than clicking on a selection list. The good news is that there is a Lots of Helpful Examples document to help get you started; this document provides links to places that can help you get the codes you need.
CDC WONDER (U.S. Dept. of Health and Human Services, Centers for Disease Control)
CDC Wonder provides a gateway to a wide variety of reports and numeric public health data. Many of the links are to menu-based extraction systems that produce downloadable summary data tables. The gateway covers the following categories: chronic diseases, communicable diseases, environmental health, health practice and prevention, injury prevention, and occupational health. The site also has an A-to-Z topic index.
Census and Survey Processing System: CSPro (U.S. Bureau of the Census)
CSPro is a public domain software package for entering, editing, tabulating and mapping census and survey data. The software is used in over 160 countries and multiple organizations, from NGOs to universities, worldwide. Free registration required for download.
CensusScope (Social Science Data Analysis Network)
The Social Science Data Network (SSDAN) at the University of Michigan offers this point and click interface to Census 2000 data. Pre-selected topics for charts, maps, and trends let the user choose a state or metro area using drop-down menus. The graphics are eye-catching and suitable for printing.
Center for Social Research Methods (William M.K. Trochim)
This site offers materials and links for people involved in social research. Site highlights include The Knowledge Base, an online hypertext textbook on applied social research methods such as defining a research question, sampling, measurement, research design and data analysis; a simulation book of manual (such as dice-rolling) and computer simulation exercises of common research designs; and a statistical advisor that points users toward appropriate statistical measures based on answers to a series of questions. William Trochim, author of the site, is a professor at Cornell University
Chance Course & Database (Dartmouth College)
The Chance Project was funded by the National Science Foundation from 1992 to 1996 to develop instructional materials for teaching basic probability and statistical concepts using examples drawn from current news and the real world. The Chance website contains a teacher's guide, syllabi and lecture notes, activities and datasets, and a current newsletter (web or e-mail) that culls up-to-date examples from current news reporting.
Conducting Research Surveys via Email and the Web (RAND Corporation and Matthias Schonlau, Ronald D. Fricker, Jr., Marc N. Elliott)
This 118-page publication examines the burgeoning trend of online research surveys. The authors carry out a literature review, discuss the advantages and disadvantages of online surveys, and offer practical suggestions for design and implementation. A chapter of case studies rounds out the publication. The seven chapters and three appendices may be downloaded together or separately in PDF.
Counterterrorism Calendars (The National Counterterrorism Center)
Information on known terrorist groups, individual terrorists, and technical information on topics such as biological and chemical threats is provided by the National Counterterrorism Center's Counterterrorism Calendar.
DataCite is a global organization, based in London, that helps researchers to find, access, and reuse data. It provides persistent identifiers, Digital Object Identifiers (DOIs) for research data to make them visible and accessible.
DataFerrett (U.S. Bureau of the Census)
The DataWeb site features the DataFerrett application, a Java-based online data application for browsing and searching through variables in the available datasets, downloading subsets, and generating tables. Datasets available include: Current Population Survey (CPS), Survey of Income and Program Participation (SIPP), American Community Survey (ACS), American Housing Survey (AHS), Small Area Income Poverty Estimates (SAIPE), Consumer Expenditure Survey (CES), County Business Patterns (CBP), Home Mortgage Disclosure Act (HMDA), National Health and Nutrition Examination Survey (NHANES), National Health Interview Survey (NHIS), Public Libraries Survey, Survey of Program Dynamics (SPD) and more.
Dissemination Standards Bulletin Board (DSBB) (International Monetary Fund)
The DSBB provides information about the Special Data Dissemination Standard (SDDS), established in 1996 to guide countries that have, or that might seek, access to international capital markets in the dissemination of economic and financial data to the public and the General Data Dissemination System (GDDS), established in 1997 to guide countries in providing comprehensive, timely, accessible, and reliable economic, financial, and socio-demographic data to the public.
Econometrics Laboratory Software Archive (ELSA) (Econometrics Laboratory, University of California, Berkeley)
The Econometrics Software Laboratory Archive of the Econometrics Laboratory at the University of California, Berkeley strives to facilitate the interchange of computational algorithms that have economic applications. ELSA makes available a variety of algorithms, programs, software manuals, and econometrics-related datasets and textbooks available for download. Datasets include, among others, Lorna Greening's Integrated Consumer Expenditure Survey data files for 1980-1994; David Card's collection of 1970 Census: raw (state) files and extracts; and several of David Card's collections of Current Population Survey files.
FAIRMODEL Economic Model (Ray C. Fair)
The FAIRMODEL economic models from Yale provides macroeconomic analysis for free. The site allows users to, "Work with a U.S. macroeconometric model (US model) or a multicountry econometric model (MC3 model) to forecast, do policy analysis, and examine historical episodes. Users can change government policy variables and examine the estimated effects of the changes, table and graph online and/or download all or part of the historical data, forecast data, and data you may have created, read online and/or download all the documentation, memos, and paper. Download for use on your own computer: the Fair-Parke (FP) program, the US model, the MC3 model, and the US model in EViews format. Users can also analyze a presidential vote equation, including examining Bush's chances in 2004, and perform stock market experiments."
Federal Justice Statistics Resource Center (FJSRC) (U.S. Bureau of Justice Statistics and Urban Institute)
The U.S. Bureau of Justice Statistics through the Federal Justice Statistics Resource Center "compiles comprehensive information describing suspects and defendants processed in the Federal criminal justice system. The goal of FJSRC is to provide uniform case processing statistics across all stages of the Federal criminal justice system. Its Federal Criminal Case Processing Statistics (FCCPS) tool is an interface used to analyze federal case processing data. Users can generate various statistics in the areas of federal law enforcement, prosecution/courts and incarcerations, and based on title and section of the U.S. Criminal Code.
FlowingData (Nathan Vau)
Since mid-2007 Nathan Vau, a PhD candidate in Statistics at UCLA, has been running the FlowingData blog. His interest is in data visualization, and his blog has attracted other like-minded data enthusiasts, who interact in a fascinating conversation, with lots of thought-provoking images and animations, on how data can be presented.
One useful category of post holds up data graphics from the media for critique by the FlowingData community. Another category presents visualizations created by the blog author himself, with animations on such topics as mapping the expansion of WalMart in the United States over time, and mapping the use of the word "inauguration" on Twitter messages worldwide in the hours surrounding the events of January 20, 2009 in Washington DC. FlowingData also holds contests for its readers to contribute visualizations based on a given dataset, while a forum page adds opportunities for reader input.
Gateway to Global Aging Data (University of Southern California, the Dornsife Center for Economic and Social Research (CESR), Program on Global Aging, Health & Policy)
Gateway to Global Aging Data is a platform designed for harmonizing cross-national studies of aging to Health and Retirement Study (HRS). It includes Health and Retirement Study (HRS), Mexican Health and Ageing Study (MHAS), English Longitudinal Study on Ageing (ELSA), Survey of Health, Ageing, and Retirement in Europe (SHARE), Korean Longitudinal Study on Aging (KLoSA), Japanese Study on Aging and Retirement (JSTAR), Indonesia Family Life Survey (IFLS), China Health, Aging, and Retirement Longitudinal Study (CHARLS), Irish Longitudinal Study on Ageing (TILDA), Study on Global Ageing and Adult Health (SAGE), and Longitudinal Aging Study in India (LASI). It has a digital library contains survey questions, sets of harmonized variables, and tools to search, compare, and obtain the information from various health and retirement surveys from 25 countries.
General Social Survey (GSS) (National Opinion Research Center (NORC))
"The General Social Survey (GSS) conducts basic scientific research on the structure and development of American society with a data-collection program designed to both monitor social change within the United States and to compare the United States to other nations." The GSS has been conducted regularly since 1972, and many of the core questions are unchanged to allow comparison across years. The GSS site at NORC allows users to search all GSS documents, browse GSS variables, download in SAS or SPSS, or analyze the data online in Nesstar. (The site also links to the SDA online analysis site at Berkeley.)
Geographic Codes Lookup (Missouri Census Data Center (MCDC))
Missouri Census Data Center (MCDC) creates this page for looking up standard codes for areas grouped into common census geographic summary levels. For United States, it includes regions, divisions, states, Micropolitan/Metropolitan Statistical Areas (CBSAs) and places. For each state it has counties, places, CBSAs, urban areas/clusters, school districts, and county subdivisions. Populations are shown in parentheses after each area on the list. Most codes are linked. The link leads to a page that lists multiple data sources for the selected area.
GIS Guide to Good Practice (Arts and Humanities Data Service (AHDS))
This UK-based site is for those who create, maintain, use and and preserve GIS-based digital resources. Although the overall emphasis is upon archaeological data, the information presented has much wider disciplinary implications. As well as providing a source of useful generic information, the guide emphasises the processes of long-term preservation, archiving and effective data re-use.
Google Refine (Google)
According to the Google Refine blog, "Google Refine is a power tool for working with messy datasets, including cleaning up inconsistencies, transforming them from one format into another, and extending them with new data from external web services or other databases." The software is open-source and builds on a project called Freebase Gridworks, which Google acquired in July 2010. The software is downloadable, so that users can work on their own desktop computers without having to load their data to a distant server.
Health Data for All Ages (HDAA) (U.S. National Center for Health Statistics)
According to the HDAA site, "this site presents tables that provide CDC health statistics for infants, children, adolescents, adults, and older adults. You can customize tables with any or all of the following characteristics: age, gender, race/ethnicity, and geographic location." Using the Beyond 20/20 system, users can browse and manipulate tables online, or download the tables and software for additional features. Table topics include: Pregnancy and Birth; Health Conditions/Risk Factors; Health Status and Disability; Health Care Access and Use; and Mortality.
ICD-10 Code Lookup Tool (Medical Billing and Coding Certification)
The ICD-10 (International Statistical Classification of Diseases - 10th Revision) is a medical classification list for the coding of diseases as maintained by the World Health Organization. The Medical Billing and Coding Certification site offers two options for looking up ICD-10 codes: a keyword search tool and a hierarchy browse. Both the searching and browsing tools offer a pop-up of US mortality data from the World Health Organization for each code. Note: The mortality-data pop-up works best with Firefox, Chrome, and the latest version of Internet Explorer.
Industry Concordances (Jon Haveman)
A wonderful site produced by Jon Haveman that provides access to a multitude of concordances (ISIC, SITC, usSIC, cSIC, HS and others) plus a list of acronyms and what they stand for AND verbal descriptions of the various classification systems.
International Household Survey Network (IHSN) (International Household Survey Network (IHSN))
The IHSN is a partnership of international organizations that aims to improve the availability and quality of household survey data in developing countries. For researchers looking for microdata, the site provides a central catalog of household surveys from developing countries, with contact information for the agencies and archives responsible for the data. When links are provided, they lead to the home page of the responsible archive; not all surveys listed are publicly available or available online. The site also provides a separate list of links to archives of survey data from developing countries.
For national statistical agencies, the IHSN site provides tools and guidelines in such areas as sampling, questionnaire design, anonymization, data archiving & dissemination. A database on planned censuses and surveys carries information about surveys that are planned or in process. A question-bank is under development, to help agencies harmonize their data collection efforts. The site also carries a Microdata Management Toolkit, developed by the World Bank Data Group, which includes a metadata editor and a CD-ROM builder tool. Some components of the toolkit are freely available, others require a license.
International Survey Center: Survey Design and Statistical Analysis in Many Nations (International Survey Center, Australia)
The International Survey Center conducts research on social, economic and political issues using survey data from large, representative national samples from many nations. Some of their data is freely available and is in the form of SPSS "portable" files. '2AF' Secondary Analysis Files are available from several cross-national projects. These include data that have been carefully worked over to make them comparable between nations and to make them user-friendly.
Internet for Social Statistics (Robin Rice, Edinburgh University Data Library and Intute)
The Internet for Social Statistics guide, written by Robin Rice of the Edinburgh University Data Library, offers a free tutorial on how to use social statistics. Users can tour sites for statistics, learn how to improve their data searching techniques, learn how to apply critical thinking skills to citing sources on the web, and reflect on how to use the web as a better tool for researching and teaching. This guide is part of the Virtual Training Suite.
Inter-University Consortium for Political and Social Research (ICPSR) (Inter-university Consortium for Political and Social Research (ICPSR))
The Inter-University Consortium for Political and Social Research (ICPSR), established in 1962, maintains and provides access to a vast archive of social science data for research and instruction, and offers training in quantitative methods to facilitate effective data use. To ensure that data resources are available to future generations of scholars, ICPSR preserves data, migrating them to new storage media as changes in technology warrant. In addition, ICPSR provides user support to assist researchers in identifying relevant data for analysis and in conducting their research projects. Codebooks are freely available, and data is available for download to ICPSR member institutions (DISC holds the UW-Madison ICPSR membership). Non-UW-Madison users may access the ICPSR site at http://www.icpsr.umich.edu/.
Introduction to Metadata: Pathways to Digital Information (Getty Standards Program)
Version 3.0 of Introduction to Metadata: Pathways to Digital Information has been placed online in its entirety by the Getty Standards Program. The publication now offers a "suite" of metadata crosswalks that map different sets of metadata. Also included are a glossary and list of hyperlinked acronyms. All sections of the book are available in both HTML and .pdf format.
IPUMS Higher Ed (The Minnesota Population Center)
IPUMS Higher Ed offers harmonized versions of the surveys incorporated into the NSF Scientists and Engineers Statistical Database (SESTAT), which is composed of three National Science Foundation surveys: the National Survey of College Graduates, the Survey of Doctorate Recipients, and the National Survey of Recent College Graduates. Its data includes education history, labor force status, employer and academic institution characteristics, income, and work activities. SESTAT data have been used previously to study a wide variety of topics, including gender differences in the labor force and the presence of immigrants in the U.S. science and engineering workforce.
Malta National Statistics (National Statistical Office)
The National Statistical Office (NSO) was established in 1981 by the Statistical Service Act (Chapter 386) and became the central agency in Papua New Guinea for providing statistical information to meet the needs of the Government for the formulation of policy and planning. Under Section 106 of the 1995 Reformed Organic Law on Provincial and Local Level Government, the NSO was also given the mandate to assist in creating statistical databases at the Provincial and Local Government levels for policy formulation and planning at these levels.
Master Area Geographic Glossary Of Terms (MAGGOT) (Missouri Census Data Center)
This document supplies useful definitions of geographic units ("geocodes") commonly used in geographic databases such as MABLE/Geocorr. Included are definitions for State, County, MCD-CCD (County Subdivisions), Place, Census Tract, Block Group and Census Block.
Medical Expenditure Panel Survey (MEPS) (American Healthcare Research and Quality)
"The Medical Expenditure Panel Survey, or MEPS as it is commonly called, is the third (and most recent) in a series of national probability surveys conducted by AHRQ (American Healthcare Research and Quality) on the financing and utilization of medical care in the United States." A number of public use files are available for download, and some data is also available in tabular format. Online statistical tools are available for analyzing household data and employer-based insurance data.
National Collaborative on Childhood Obesity Research (NCCOR) Catalog of Surveillance Systems (
National Collaborative on Childhood Obesity Research (NCCOR))
The National Collaborative on Childhood Obesity Research (NCCOR) Catalogue of Surveillance Systems has 100 publicly available datasets with information on health behaviors, outcomes, determinants, policies and environmental factors. This free online resource was created for researchers and practitioners to investigate childhood obesity in America.
National Crosswalk Service Center (NCSC) (Iowa Center for Career and Occupational Resources (ICCOR))
The National Crosswalk Service Center (NCSC) specializes in occupational and training program classifications, their relationships to each other, and to related data. A "crosswalk" allows users to interpret one classification system in terms of another. NCSC makes crosswalks available for FTP download, and also serves as a depository of other computerized occupational and educational information resources.
NCES Handbook of Survey Methods (National Center for Education Statistics (NCES), U.S. Department of Education)
The NCES Handbook of Survey Methods explains how the National Center for Education Statistics (NCES) obtains and prepares the data it publishes for each of its survey programs.
Networked Social Science Tools And Resources (NESSTAR) (Networked Social Science Tools And Resources)
NESSTAR is a system for data discovery, location, browsing and dissemination via the Internet. A web-based browser interface called NESSTAR Light lets users do simple analyses and download data that has been mounted on a NESSTAR server. Data Archives currently offering data through NESSTAR include the UK Data Archive, Norwegian Social Science Data Services (NSD), Danish Data Archive (DDA) and the Finnish Social Science Data Archive (FSD).
NLS Investigator (Center for Human Resource Research (CHRR), Ohio State University)
The NLS Investigator is a web-based interface to documentation and data from all the cohorts of the National Longitudinal Study (NLS). Like its predecessor Web-Investigator, NLS Investigator allows users to search the database by variable name, question text, survey year and question number. Users can view the codebook information associated with variables, select and extract variables, and create a codebook unique to the variables chosen. Investigator provides value labels in the statistical results files. A weighting program option lets users create a custom set of survey weights, making it easier to accurately calculate summary statistics from multiple years of data. Registered users can perform variable extractions without downloading any software or full data files, and can update and save their tag sets on the server for up to 90 days. Result files can be saved to a local computer or left in a personal NLS Investigator account for up to 4 days. Note: the old Web-Investigator version will be disabled after October 29, 2010. Users who are using the new NLS Investigator for the first time will have to complete a one-time free re-registration.
Open Calls for Comment on Federal Data Collections (Association of Public Data Users (APDU))
The Association of Public Data Users provides an up-to-date spreadsheet, linked from this page, of current statistical issues for public comment as announced in the notices of the Federal Register. Calls for comment are organized first by agency, then by closing date for comments. Each call for comment is linked to the Federal Register page, along with a contact person. A second page within the spreadsheet lists already-expired calls for comment.
Penn World Tables (PWT) (University of Groningen)
The Penn World Table (PWT) is a set of national-accounts data developed to measure real GDP across countries and over time. PWT allows for comparisons of relative GDP per capita, as a measure of standard of living, the productive capacity of economies and their productivity level.
Population Research Institute (PRI) (Pennsylvania State University)
The Population Research Institute (PRI) at Penn State focuses on research and training in the population sciences. Their initiatives include a library with a data archive (available from the links on the right in the box labelled "For PRI Affiliates"). The data archive makes available an online data extraction engine called SodaPop, at http://sodapop.pop.psu.edu/, that allows users to create data extracts and view documentation for the datasets included in the system. A variable-search function is available within datasets, but not across the entire collection. PRI affiliates may create extracts or download datasets for any of the SodaPop holdings, while non-affiliated users may fill out an online application form to request access for files that are not restricted. Any user, however, may search the data and view the documentation. The PRI website highlights a number of studies carried out under Penn State auspices, such as the TREMIN Research Program on Women's Health; the Puerto Rican Maternal and Infant Health Study; and the Marital Instability over the Life Course Study.
Sample Size Calculator (Creative Research Systems)
This site actually contains two calculators: one for determining the necessary sample size for a given population and desired confidence interval, and one for calculating confidence interval for a given population and sample size.
Scholars' Lab (University of Virginia)
Scholars’ Lab at the University of Virginia Library is set up to assist advanced students and researchers on their digital projects. Their faculty and staff focus on the digital humanities, geospatial information, and scholarly making and building at the intersection of our digital and physical worlds.
SDMX - Statistical Data and Metadata Exchange (SDMX)
Statistical Data and Metadata eXchange (SDMX) is an international initiative that aims at standardising and modernising the mechanisms and processes for the exchange of statistical data and metadata among international organisations and their member countries. SDMX is sponsored by seven international organisations including the Bank for International Settlements (BIS), the European Central Bank (ECB), Eurostat (Statistical Office of the European Union), the International Monetary Fund (IMF), the Organisation for Economic Cooperation and Development (OECD), the United Nations Statistical Division (UNSD), and the World Bank.
Social Research Update (University of Surrey, United Kingdom)
This general reference periodical for beginning social science researchers is issued quarterly by the Department of Sociology, University of Surrey, Guildford, England. Previous issues have included such topics as Ethnographic writing, Archiving qualitative research data, and Secondary analysis of qualitative data.
Social Science & Government Data Library (University of California, Berkeley)
The Social Science and Government Data Library (SSGDL) is a collaboration between the UC-Berkeley Library and UC DATA on the University of California, Berkeley campus. The SSGDL web site carries both an extraction system and FTP links for U.S. Census Data. The extraction system contains 1990 census data from SSTF1, SSTF2 (Ancestry of the Population of the US) SSTF3 (Persons of Hispanic Origin in the United States), and SSTF5 (Characteristics of Asian and Pacific Islander Population of the US). Users can pick both geographies and variables.
The FTP files available from the site include:
- Census 2000: Summary File 1 (SF1), Redistricting Data (P.L. 94-171), Race and Hispanic or Latino Summary
- 1990 Census: Congressional Districts in the U.S., Equal Employment Opportunity File, Public Law 94-171 data, Public Use Microdata Samples (PUMS) - 1% and 5% data, Summary Tape File 1B (includes PR files), Summary Tape File 3 (includes 3A, 3B, and 3C), Subject Summary Tape Files
- 1970 Census Fifth Count Special Tabulation
- County & City Databook 1988 and 1994
- Current Population Survey files between 1988 and 1993
- Economic Census Data (1987, 1992, 1997)
- TIGER/Line 1997 files
Downloaded FTP files use the "Go" extraction system.
Social Science Japan Data Archive (SSJDA) (Center for Social Research and Data Archives, Institute of Social Science, University of Tokyo)
The Social Science Japan Data Archive (SSJDA) is affiliated with the Center for Social Research and Data Archives in the Institute of Social Science at the University of Tokyo. SSJDA collects, maintains, and provides access to social science data to researchers who are interested in Japanese quantitative data for secondary analyses. Users are required to fill out online applications and get approval before they can access datasets housed in SSJDA. Most of the datasets are in Japanese.
Society for Political Methodology (American Political Science Association)
The Political Methodology section of the American Political Science Association offers a hyperlink to the Political Methodology Working Papers, 1995 to the present. Authors submit papers to be downloaded, abstracts are readable on the web, and the full text of the paper may be read or downloaded in PDF.
STATS (The Statistical Assessment Service)
The Statistical Assessment Service "examines the way that scientific, quantitative, and social research are presented by the media, and works with journalists to help them convey this material more accurately and effectively." STATS sets the record to rights in an engaging, direct manner.
SuperSTAR (Space-Time Research (Melbourne, Australia) and Alta Plana Corporation)
The SuperSTAR statistical tabulation suite is used for analysis and dissemination of demographic, social, survey, trade, and marketing data. The suite includes the SuperCROSS Windows module and the SuperWEB browser interface and provides integrated statistical calculations, confidentiality algorithms, charting, mapping, and data extraction. It runs against micro-data and summarized data cubes with a multi-lingual interface and support for multi-lingual metainformation. Note: this is a fee-based product.
Survey Documentation and Analysis (SDA) Archive (University of California-Berkeley)
SDA is a set of programs for the documentation and analysis of survey data. The programs can produce codebooks either for printing or for browsing on the World Wide Web. Data analysis programs in the package can be run in various ways, including online from a Web browser. Data available here include: GSS (General Social Survey) 1972-2004 Cumulative Datafile; NES (US National Election Study) cumulative back to 1948 plus individual years since 1996; some census microdata from the US and California; several Labor and Health surveys; and several surveys on racist attitudes and prejudice. The site also links to other data archives that use SDA. Also included at this site is information on the Data Documentation Initiative (DDI) and Instrument Documentation (IDOC), a project to develop network-browsable documentation for CAI instruments, including the SIPP.
Teaching with Data (Inter-University Consortium for Political and Social Research (ICPSR) and Social Science Data Analysis Network (SSDAN))
The Teaching With Data web site offers annotated links to data-driven teaching materials primarily aimed at the undergraduate level, though the site-wide search tool includes a K-12 option. Classroom resources include lessons and lectures, exercises and modules, syllabi and reading lists. Data resources include both tabular and downloadable data, data-based maps, and links to various data archives. Tools for analysis, visualization and course development are highlighted as well. Users can browse the site by discipline: anthropology, economics, environmental sciences, geography, history, political science, public policy, social work, and sociology. A "Data in the News" feature links the site to current events. Teaching With Data is a partnership between ICPSR and the Social Science Data Analysis Network (SSDAN), both at the University of Michigan. The project is funded by the National Science Foundation.
The Center for Spatially Integrated Social Science (National Science Foundation)
The CSISS site focuses on the importance of space, location, and place in social science research. The site features learning tools and bibliographies regarding GIS and social sciences, as well as a search engine and annotated links to spatial tools elsewhere on the web. In development is a data search engine intended for searching across social science data archives.
The Consortium for International Earth Science Information (CIESIN) (Columbia University)
The Center for International Earth Science Information Network (CIESIN) is a center within the Earth Institute at Columbia University. CIESIN works at the intersection of the social, natural, and information sciences, and specializes in on-line data and information management, spatial data integration and training, and interdisciplinary research related to human interactions in the environment. The web site features two metadata catalogs and downloadable data such as the China Dimensions data collection and the U.S. PUMA boundary files for 1990.
The Higher Education Resource Institute (University of California-Los Angeles)
The Higher Education Research Institute is an " interdisciplinary center for research, evaluation, information, policy studies, and research training in postsecondary education." Based in the University of California-Los Angeles Graduate School of Education and Information Studies, the HERI sponsors the CIRP survey of college freshmen as well as the HERI Faculty Survey and the CSS (College Senior Survey). Sample formats of the CSS and CIRP (Cooperative Institutional Research Program) are available on the site in PDF. The site also includes links to recent research at the Institute as well as Institute publications.
TranStats (U.S. Bureau of Transportation Statistics)
TranStats comprises a broad collection of over 100 transportation datasets from various federal sources such as the Department of Transportation and the Census Bureau. TranStats is searchable by keyword or category. Some of the data descriptions link to data stored on other sites; for the many datasets stored at TranStats, however, users have interactive control over which variables to download, in addition to interactive analysis tools (simple statistical summaries, create time series or cross tabulations, generate graphics online, and cut/paste results into reports). A "mapping center" is also available through TranStats, carrying the National Transportation Atlas Databases (NTAD) and other transportation mapping tools. Note: TranStats was formerly known as the Intermodal Transportation Database.
Trends in Health and Aging (U.S. National Center for Health Statistics)
This site presents a collection of tables on trends in the health of older Americans showing data by age, sex, race and Hispanic origin. Using the Beyond 20/20 system, users can browse and manipulate tables online, or download the tables and software for additional features, including mapping and statistical tests. Tables are categorized into 19 topics: Chronic Conditions, Functional Status and Disability, Health Care Expenditures, Health Care Utilization, Health Insurance, Immunization, Incontinence, Injury, Life Expectancy, Living Arrangements, Mental Health, Mortality, Oral Health, Perceived Health Status, Population (Nation and State), Risk Factors, Socio-Economic Status, Special Equipment Use, and Use and Cost of Prescription Medication.
U.S. Demography (CIESIN)
Included here are informative explanations of the following datasets: Public Use Microdata Samples, Current Population Survey, Economic Census Data, County Business Patterns, County City Data Book, Statistical Abstract Supplement, National Economic Social and Environmental Databank, Regional Economic Information System, Enhanced County to County Migration 1985-1990, TIGER 1992 Boundaries, and STF3A Standard Extracts.
UN Classifications Registry (United Nations Statistics Division)
The Classifications registry keeps updated information on Statistical Classifications maintained by the United Nations Statistics Division (UNSD). Downloadable classifications include International Standard Industrial Classification of All Economic Activities(ISIC), Central Product Classification (CPC), Standard International Trade Classification(SITC), Classification by Broad Economic Categories (BEC), Classification of the Functions of Government (COFOG), Classification of Individual Consumption According to Purpose (COICOP), Classification of the Purposes of Non-Profit Institutions Serving Households (COPNI), Classification of the Outlays of Producers According to Purpose (COPP) and International Classification of Activities for Time-Use Statistics (ICATUS). Rulings, corrections, interpretations and proposals for future revisions are recorded and can be viewed from the Registry entries link at this site. National Classifications section includes information on national practices in the area of classifications, covering activity and product classifications used in countries around the world.
Understanding the 1990 Public Use Microdata Sample (PUMS) (UCLA)
This document presents a general overview of PUMS, while specifically discussing the distribution of census questionnaires, privacy protection, selection of PUMS 5% data, and structure of the 1990 PUMS 5% data which includes geographic, household, and person information.
Variable and Question Bank (UK Data Archive (UKDA) at the University of Essex)
The Variable and Question Bank is a reference source for question formats and wordings used on major social surveys in the UK. It provides supporting material on concepts and methodology, and aims to disseminate knowledge about survey data collection methods to achieve comparability of results.
WebCASPAR (National Science Foundation)
WebCASPAR bills itself as an "integrated science and engineering resources data system." The database system is a collection of statistical data from several surveys in higher education from NSF and NCES via a web-based extraction form, allowing users to create tables (or view pre-defined tables). Includes institution-level data. Free registration is required to be able to customize the search fully.
Western Libraries Map and Data Centre (University of Western Ontario, Canada)
The former Data Resources Library at the University of Western Ontario, Canada has joined forces with the Serge A. Sauer Map Library Map to form the Western Libraries Map and Data Centre. The holdings and services of the two former entities will be combined, in the mission "to deliver map, GIS and data services to the Western community of students, staff, and faculty."
World Health Organization Statistical Information System (WHOSIS) (World Health Organization)
Provides searching and browsing options for finding international health-related statistics on the WHO web site and beyond. Online databases accessible from the WHOSIS page include Core Health Indicators; Life Tables; Mortality; Tuberculosis; HIV/AIDS; Alcohol; and Global Health Atlas. The WHO data offerings are more extensive than is immediately apparent. Users may want to use the site-search function on the WHOSIS page. The site also offers what they call a WHOSIS query service, consisting of a Frequently Asked Questions document and a contact form to send a question to WHO staff.
Worldmapper (University of Sheffield (UK) and University of Michigan)
The Worldmapper site takes its catchphrase, "The World as You've Never Seen It Before," and puts it into data-driven action, featuring cartograms that display global regions "re-sized according to the subject of interest." A total-population world map, for example, displays India and Japan swollen to outsized proportions, while the United States looms large on the map of private spending on health-care and Southeastern Africa dominates the map of HIV prevalence. Some of the broad topics include health, education, transportation, communication, work, and housing, but the list continues to expand. Each map comes with a downloadable PDF poster and downloadable data files in Excel and OpenDoc format. The Worldmapper project is a collaboration between the University of Sheffield (UK) which hosts the site, and the University of Michigan.
ZIP Code Resources Page (MABLE/Geocorr)
This page describes a series of tools for helping users deal with 5-digit U.S. postal ZIP code areas. It focuses primarily on tools for linking ZIP codes to other geographies (such as counties, cities, metro areas, ZCTAs) and to demographic information from the 2000 decennial census. Excellent explanation of the "messiness" of using the ZIP code as a geographic unit.
(Back to the Crossroads Search main page)