Legal datasets
- Data.gov/Law – Data and Tools available on Data.gov
- The Supreme Court Database
- NACJD National Archive of Criminal Justice Data
- PACER Public Access to Court Electronic Records
- ReCAP Research Collections and Preservation Consortium @ Columbia University
- Public.Resource.Org
- Justia
- Law Library of Congress
- Legal Information Institute @ Cornell
- Ian Ayres @ Yale
- Andrei Shleifer @ Harvard
- World Legal Information Institute
- Law Dataverse @ Georgetown
- Congreso Visible (Colombian Congress)
- Consejo de Estado (Colombian Council of State)
- Rama Judicial (Judiciary of Colombia)
Network datasets
- The Human Nature Lab @ Yale
- Visual Complexity Visualization of complex networks
- CSCS Center for the Study of Complex Systems @ University of Michigan
- The Siena webpage @ University of Oxford
- Statistical Analysis of Network Data @ Boston University
- The Sparse Matrix Collection @ University of Florida
- SNAP Stanford Large Network Dataset Collection
- APS American Physical Society Data Sets for Research
- Pajek Data Set
- Cx-nets @ Indiana University
- TAGora – Semiotic Dynamics in Online Social Communities – @ La Sapienza University in Rome
- SocioPatterns.org
- Roldan Pozo @ NIST
- Weizmann Institute
- Carolina Center for Interdisciplinary Applied Mathematics @ University of North Carolina
- Gephi – Gephi sample dataset
- IQSS Dataverse Network @ Harvard
- The Koblenz Network Collection @ University of Koblenz–Landau
- Alex Arenas @ Universidad Rovira i Virgili
- UCINET
- ACM Special Interest Group on KDD
- UCI Machine Learning Repository
- Truthy @ Indiana University
- Flowing Data (network-visualization)
- Matthew O. Jackson @ Stanford
- Linton C. Freeman @ UC Irvine
- Network Data Repository @Purdue University
- UCI Network Data Repository @ UC Irvine
- QuantUrb
- Networking Group @ UC Irvine
- Index of Complex Networks @ University of Colorado Boulder
Corruption indicators
Reports
Input-based perception indicators
- MSI – International Research & Exchange Board
- POLITY IV – George Mason University and Colorado State University
Input-based objective indicators
- CPIA – The World Bank: Country Policy and Institutional Assessment
Output-based perception indicators
- BTI – Bertelsmann Transformation Index
- BEEPS – The World Bank: Business Environment and Enterprise Performance Surveys (Enterprise Surveys Portal Library)
- CPI – Transparency International: Corruption Perception Index (measures corruption in the public sector)
- GCB – Transparency International: Global Corruption Barometer
- IEF – The Heritage Foundation. Uses CPI as source data.
- ICRG – Political Risk Services Group: International Country Risk Guidance
- Afrobarometer Eurobarometer Latinobarómetro Asiabarometer – Global Barometer Consortium
Output-based objective indicators
- IAG– Index of African Governance
- GCI – The World Economic Forum
- FW – Freedom House
- EIU – Economist Intelligence Unit
- DB – The World Bank: Doing Business project
- PEFA – World Bank: Public Expenditure & Financial Accountability
Input & output-based indicators
- OBI – International Budget Partnership
- GII – Global Integrity (focuses on anti-corruption mechanisms)
- IAG – The Mo Ibrahim Foundation
- WGI – The World Bank. Measures corruption in the public and private sector.
To avoid the labeling trap dig underneath indicators to understand what each is trying to measure. Input-based indicators try to capture information about the existence and strength of laws, regulations, and institutions – anti-corruption efforts/inputs – providing key benchmarks that lead to good governance practices. Also known as rule-based indicators. Output-based indicators do not try to measure what is causing the problem nor point to potential solutions. They are useful to assess progress towards a desired objective of anti-corruption or governance reform policies. Also known as outcome-based indicators.
Governance indicators
- CDA – Catalogo de Datos Abiertos – Colombia
- GovData360 – The World bank GovData360
- WB – The World Bank development indicators
- The Fund for Peace – Failed State Index
- AWS – Public Data Sets on Amazon’s Web Services
- USASpending
- ACRN – Anti-corruption Research Network
- Observatorio – Anticorrupción y de Integridad – Colombia
- Walk Free – fights to end human trafficking. For more information on child trafficking go to ZOE
- OpenSecrets – Center for Responsive Politics
- HRI – Humanitarian Response Index (2007-2011)
Social science datasets
- Data in Gapminder World
- Global Competitiveness Index
- Social Sciences Web Resources @ McMaster University
- Gallup Polls
- NBER – National Bureau of Economic Research
- Google Google’s Public Data Explorer
- APRM – African Peer Review Mechanism
- MSI – IREX’s Media Sustainability Inde
- OVC – Office for Victims of Crime
- JRI – The ABA Rule of Law Initiative: The Judicial Reform Index
- J-PAL – Poverty Action Lab: Evaluations
- ICGG – The Internet Center for Corruption Research
- Worldbank – Worldwide Governance Indicators
- IQSS Institute for uotesuantitative Social Sciences
- ICPSR – Inter-University Consortium for Political and Social Sciences
- UNdata – The United Nations Statistics Division
- UCI SSDA – Social Science Data Archive (UC, Irvine)
- Wolfram|Alpha – Wolfram Research
- Gapminder
- WHO – World Health Organization
- SEDAC – Socioeconomic Data and Applications Center at Columbia University
- UNESCO – UN Educational, Scientific and Cultural Organizations: Institute for Statistics
- IMF – International Monetary Fund: Data and Statistics
- CRINIS – Transparency International
- UNICRI – United Nations Interregional Crime and Justice Research Institute
- CESifo – The Center for Economic Studies (CES) AND the Institute for Economic Research (Ifo)
- IADB – Inter-American Development Bank
- DANE – Departamento Administrativo Nacional de Estadistica (Colombia); Catalogo de microdatos
- DNP – Departamento Nacional de Planeación (Colombia)
- CINEP – Centro de Investigación y Educación Popular (Colombia)
- IDESC – Infraestructura de Datos Espaciales de Santiago de Cali
- MOE – Mision de Observación Electoral (Colombia)
- ICU – International Communication Union
- U.S. Census Bureau
- USASpending.gov
- FPDS – Federal Procurement Data System
- Data.gov – The U.S. government’s open data initiative
- Data.gov.uk – The U.K. government’s open data initiative
- Data.go.jp – The Japanese open data initiative
- Data.gv.at – The Austria government’s open data initiative
- Dollars for Docs – ProPublica
- Prescriber Checkup – ProPublica
- Open Data Index
- Open Aid Data – Data for development aid
- Sociómetro BID – Microdatos de las encuestas de hogares en Latinoamérica
- City-Data – Data from numerous sources, from crime rates to weather patterns
Time series datasets
- EPI – Earth Policy Institute
- Time Series Data
- Google Trends
- Think with Google
Geographic information datasets
Public data for various cities in the US and Canada
A quick overview on what data cities are sharing can be found here.
- Ottawa, ON
- Toronto, ON
- Austin, TX
- Houston, TX
- Oakland, CA (crime incidents) – civic data for Alameda County
- Santa Cruz, CA
- Chicago, IL
- Boston, MA (crime incidents)
- New Orleans, LA
- San Francisco, CA
- New York City, NY (aggregated data on crime)
- Baltimore, MD – civic data for the state of Maryland
- Portland, OR
- Seattle, WA
- San Jose, CA
- Palo Alto, CA
- Philadelphia, PA
- Washington, DC (crime incidents)
- Montreal, QC
- Asheville, NC (crime incidents)
Public data for various cities in Europe
Public data for various cities in Latin America and the Caribean
Public data for various countries in Africa
Miscellaneous
- Kaggle – collaborative platform to find and publish data sets
- Awesome Public Datasets – List of open datasets in public domains
- DataPortals.org
- The Open Data Handbook – An Open Knowledge Foundation project (provide information on legal, social and technical aspects of open data)
- PublicLegal – Internet Legal Research Group (ILRG)
- Big Data Startups
- Flowing Data
- OpenData – by Socrata
- OpenDataKit
- OpenSpending.org – Allows you upload any kind of financial data
- Nosql databases – A list of “not only sql” databases
- Neo4j – A popular graph database
- cKan – A popular open-source data portal platform
- dKan – A Drupal open-data platform
- graph-tool – An Python module for manipulation and statistical analysis of graphs
- Internet Archive
- Stack Exchange Data Explorer
- Academic Torrents