Our Approach to Data

Our approach to data is grounded in applying an equity-based lens as we tell the story of the Black wealth experience. How we collect, analyze, and share data and insights can have serious implications for how the Black wealth experience is understood. Every part of the BWDC platform - from the standards we use to assess the quality of a dataset, to how we present topical insights and exploratory dashboards - is designed to most truthfully and responsibly represent the people behind data points.

Here are a few things we keep in mind when building the BWDC platform with an equity-based lens:

  • Telling a comprehensive story. There isn’t any one indicator that can fully explain Black wealth. We utilize datasets across a variety of topics (homeownership, debt, education, COVID-19, and many more), so that users have access to a comprehensive story of the Black wealth experience.

  • Designing to empower users. Even the most accurate dataset can be misrepresented. We’re intentional about the content and design of each visualization so that users are best equipped to interpret Black wealth data. Many BWDC visualizations are inspired by the work of W.E.B. Du Bois.

  • Accountability through transparency. We try to make our choices as transparent as possible to users: this includes everything from what datasets we’re using, how-to guides for interpreting visualizations, to this methodology page. Transparency is how we stay accountable to users visiting the platform and to the people represented in BWDC data.

Ultimately, our approach is an on-going conversation among all users in the BWDC platform community, because together we can innovate a more equitable approach to telling the Black wealth story.

Data Methodology

Our data methodology is a hybrid of best practices from the fields of open data, civic tech, and data science, in addition to emerging frameworks around responsible and transparent use of big data. As we open the BWDC data repository to the public in the future, we aim to share the outputs from our data quality framework for each data source described in this section.

Assessing the Value of Datasets

Before adding data to the BWDC platform, we assess the value of new datasets by using the

Datasheets for Datasets framework (Gebru et al. 2021)¹. This framework was developed with extensive feedback from researchers, practitioners, and policy makers to increase transparency and accountability around how datasets are created and maintained, and how they should be appropriately used.


Dataset Characteristic

Description

Motivation

Document how dataset creators articulate their reasons for creating the dataset and funding interests

Composition

Provide dataset consumers with the information they need to make informed decisions about using the dataset for their chosen tasks

Collection Process

Flag potential issues with data collection

Preprocessing / Cleaning / Labeling

Provide dataset consumers with the information they need to determine whether the “raw” data has been processed in ways that are compatible with their chosen tasks

Uses

Reflect on the tasks for which the dataset should and should not be used

Distribution

Document any distribution, licensing, privacy, or regulatory restrictions

Maintenance

Capture how the dataset will be maintained by dataset consumers


¹ Datasheets for Datasets - Microsoft Research

Monitoring Data Quality

We monitor the data that powers the BWDC platform through a robust set of QA/QC checks. When applicable, we utilize high-quality, publicly available datasets that are already widely used by researchers and practitioners, such as the Google Cloud Public Dataset Program.

Many data quality frameworks are oriented towards dataset creators - the people that are collecting data by surveying populations in the field, tracking user behavior, etc. Given that BWDC is not generating new datasets but rather aggregating and sharing datasets, we draw on a peer-reviewed framework that captures common data quality indicators from the perspective of dataset users rather than creators.

As we encounter new challenges with data quality and allow users to directly access the BWDC data repository in the future, we may add more indicators here to ensure we are always delivering a high-quality product to BWDC users.²

Adaptation of the Big Data Quality Framework³

Dimension

Elements

Description

Availability

Timeliness

Timeliness is defined as the time delay from data generation and acquisition to utilization⁵. The most recent version of a dataset should be made available for analysis within a reasonable timeframe.

Accessibility

Accessibility refers to the difficulty level for users to obtain and analyze data, and is closely linked with data openness. Open data standards⁴ are designed to ensure all users can access and maximize the use of open data.

Usability

MetaData

With the increase of data sources and data types, it’s easy for data consumers to misconstrue the meaning of common terminology and concepts of data. Data producers need to provide metadata describing different aspects of the datasets to reduce the problems caused by misunderstandings or inconsistencies.

Definition & Documentation

Definition & documentation consists of normative documentation for dataset names, definitions, ranges of valid values, standard formats, business rules, etc.

Reliability

Accuracy

To ascertain the accuracy of a given data value, it is compared to a known reference value. In some cases, there is no known reference value, making it difficult to measure accuracy. Because accuracy is correlated with context to some extent, data accuracy is considered in context.

Integrity

In information security, data integrity means maintaining and assuring the accuracy and consistency of data over its entire life-cycle. This means that data cannot be modified in an unauthorized or undetected manner.

Consistency

Data consistency refers to whether the logical relationship between correlated data is correct and complete. In the field of databases⁶, it usually means that the same data that are located in different storage areas should be considered to be equivalent. Equivalency means that the data have equal value and the same meaning or are essentially the same.

Completeness

Completeness means that the values of all components of a single datum are valid. For example, many population datasets do not adequately represent ethnic and racial minority groups or local and rural geographies due to sampling and nonresponse bias. If certain subgroups are missing, the data can produce inaccurate and inconsistent estimates, and ultimately false conclusions.


² The Challenges of Data Quality and Data Quality Assessment in the Big Data Era (codata.org)

³ Not all dimensions and elements that were published are included here. Additionally, we have tailored the element descriptions to the BWDC use case, which may evolve over time.

⁴ Best Practices — U.S. Open Data Toolkit (usopendatatoolkit.org)

⁵ (McGivray, 2010)

⁶ (Silberschatz, Korth, & Sudarshan, 2006)

Data Sources

Datasets

Publication Year(s) used (up to most current available unless otherwise noted)

Description (as provided by the dataset publisher)

Explore Data page where this data is accessible

Survey of Consumer Finances (SCF)

2007, 2010, 2013, 2016, 2019,2022

The SCF is a triennial cross-sectional survey of U.S. families. The survey data include information on families’ balance sheets, pensions, income, and demographic characteristics. Information is also included from related surveys of pension providers and the earlier such surveys conducted by the Federal Reserve Board.

Assets & Debt

Survey of Program Participation (SIPP)

2021

SIPP is a nationally representative longitudinal survey that provides comprehensive information on the dynamics of income, employment, household composition, and government program participation.

Assets & Debt

Debt to Income Ratio, US Federal Reserve and Equifax Credit Panel

2007-2022

Household debt is calculated from FRBNY/Equifax credit panel and household income is reported from the Bureau of Labor Statistics.

Assets & Debt

Survey of Unbanked and Underbanked Households, FDIC

2017-2021

The Survey of Unbanked and Underbanked Households collects information on bank account ownership; use of prepaid cards and nonbank online payment services; use of nonbank money orders, check cashing, and money transfer services; and use of bank and nonbank credit.

Assets & Debt

American Community Survey (ACS)

2016 5-year estimates, 2020 5-year estimates, 2021 5-year estimates, 2022 5 year estimates

The ACS is the flagship product of the US Census and the premier resource for detailed population and housing data within the United States nationally and down to a hyper-local level. Unemployment rate figures are used in the Business Investment Need Index.

Assets & Debt, Education, Employment, Business Ownership, Homeownership, Population, Local Wealth Explorer

Annual Business Survey (ABS)

2021

The ABS Program combines data results from survey respondents and administrative records to produce data on business ownership. The survey is collected from employer businesses and the nonemployer data are compiled from administrative records.

Business Ownership

Small Business Administration (SBA) 7(a)

2011-2021

The SBA releases FOIA data for 7(a) loans,the SBA’s most popular program for financial assistance to businesses

Business Ownership

Small Business Administration (SBA) 504 Loan Records

2011-2023

The SBA releases FOIA data 504 loans, which focus on promoting business growth and job creation.

Business Ownership

Small Business Administration (SBA) Paycheck Protection Program (PPP) Loan Records

2020-2021

The SBA releases administrative data on the federal paycheck protection program, including: Loan characteristics (e.g. amount); Business characteristics (e.g. business type); Borrower characteristics (e.g. address).

Business Ownership

National Center for Education Statistics (NCES)

Select years between 2010 and 2023

The NCES provides a compilation of government, private data sources, and surveys on education statistics, including: School and college characteristics (e.g. number of enrollments); Educational attainment (e.g. graduation rate); Finances (e.g. federal funds).

Education, Employment

High Speed Internet Usage, Microsoft Airband Initiative

2020

Estimated Broadband Usage by combining data from several Microsoft products

Education

Bureau of Labor Statistics Current Population Survey (CPS)

2013 - 2021

The CPS is a monthly survey of households conducted by the Bureau of Labor Statistics that is the primary source for U.S. labor force statistics, including: Employment (e.g. work status, industry); Demographic characteristics (e.g. age); Supplemental topics (e.g. child support, health insurance coverage, COVID-19).

Employment

Unemployment Insurance Weekly Claims Data

2020 - partial 2024

The Unemployment Insurance Weekly Claims dataset from the U.S. Department of Labor provides weekly unemployment insurance claims, including the number of initial claims and continued weeks of claimed unemployment benefits

Employment

U.S. Inflation and Unemployment Data

2022

The Geographic Profile of Employment and Unemployment presents annual data on the employed and unemployed for states and census regions and divisions. The estimates include demographic characteristics, occupation, industry, class of worker, hours of work, duration of unemployment, and reason for unemployment. The source of the data is the Current Population Survey.

Employment

National Level Poverty Rates

2020-2024

Center on Poverty andd Social Policy, Columbia University

Assets and Debt

Candid data on non-profit demographics

2011-2023

The SBA releases FOIA data 504 loans, which focus on promoting business growth and job creation.

Employment

Local Area Unemployment Statistics

2024

The Local Area Unemployment Statistics (LAUS) program from the Bureau of Labor Statistics produces monthly and annual employment, unemployment, and labor force data for Census regions and divisions, States, counties, metropolitan areas, and many cities, by place of residence

Employment

Longitudinal Employer-Household Dynamics, Origin-Destination Employent Statistics (LODES)

2021

The Longitudinal Employer-Household Dynamics (LEHD) program is part of the Center for Economic Studies at the U.S. Census Bureau. The LEHD program produces cost effective, public-use information combining federal, state and Census Bureau data on employers and employees under the Local Employment Dynamics (LED) Partnership. State and local authorities increasingly need detailed local information about their economies to make informed decisions

Employment

Business Formation Statistics (BFS)

2011-2023

The BFS are a standard data product of the U.S. Census Bureau developed in research collaboration with economists affiliated with the Board of Governors of the Federal Reserve System, Federal Reserve Bank of Atlanta, University of Maryland, and University of Notre Dame. The BFS provides timely and high frequency information on new business applications and formations in the United States.

Business Ownership

Quarterly Census of Employment and Wages (QCEW)

2023

The Quarterly Census of Employment and Wages (QCEW) program publishes a quarterly count of employment and wages reported by employers covering more than 95 percent of U.S. jobs, available at the county, MSA, state and national levels by industry.

Employment

Survey of Household Economics and Decisionmaking (SHED)

2019, 2023

Since 2013, the Federal Reserve Board has conducted the Survey of Household Economics and Decisionmaking (SHED), which measures the economic well-being of U.S. households and identifies potential risks to their finance

Assets and Debt

National Level Poverty Rates

2020, 2024

Since 2013, the Federal Reserve Board has conducted the Survey of Household Economics and Decisionmaking (SHED), which measures the economic well-being of U.S. households and identifies potential risks to their finance

Assets and Debt

National Risk Index for Natural Hazards (NRI)

2023

The National Risk Index published by FEMA shows which communities are most at risk to natural hazards. It includes data about the expected annual losses to individual natural hazards, social vulnerability and community resilience.

Black Wealth Indicators

FDIC Active Institutions

2023

This dataset, published by the FDIC, lists FDIC-insured banks and branches.

Black Wealth Indicators

Home Mortgage Disclosure Act (HMDA)

2017-2022

The Consumer Financial Protection Bureau releases data based on the Home Mortgage Disclosure Act (HMDA), which requires many financial institutions to maintain, report, and publicly disclose loan-level information about mortgages.

Homeownership

Housing Vacancies and Homeownership (HVS)

2016, 2017, 2018, 2019, 2020, 2021,2022,2023

The HVS dataset is a dataset that is published by the US Census on a quarterly basis and provides information on rental and homeowner vacancy rates.

Homeownership

CDC/ATSDR Social Vulnerability Index

2022

The Social Vulnerability Index (SVI) employs U.S. Census Bureau variables to help users identify communities that may need support in preparing for hazards or recovering from disasters.

Homeownership

The New York Times US Coronavirus Database

2020 - partial 2023

This is the US Coronavirus data repository from The New York Times . This data includes COVID-19 cases and deaths reported by state and county. The New York Times compiled this data based on reports from state and local health agencies.

Population

Yelp Business Data

2024

Yelp data are ingested via the platform's API. Businesses self identify as being Black owned by adding the Black-owned Business attribute to their Yelp business page.

Local Wealth Explorer

PurpleAir Air Quality Data

2024

PurpleAir is a vendor of low-cost air quality sensors designed to sample air for pollution. The Black Wealth Data Center uses PurpleAir data by taking ALT cf=3 sensor data for PM2.5 particle pollution and applying the air quality index calculation formula provided by the Environmental Protection Agency.

Local Wealth Explorer

Environmental Protection Agency AirNow

2024

EPA AirNow data show air quality readings recorded at various intervals of time. Air quality index is a calculation based on the concentration of PM2.5 particles in the sampled air.

Local Wealth Explorer

U.S Parks Data

2024

The Trust for Public Land makes location, geography and access information available for roughly 145,000 U.S. parks.

Local Wealth Explorer

School locations

2022

The National Center for Education Statistics collects information including location details on public elementary and secondary schools, including charter schools, as part of its Common Core of Data dataset.

Local Wealth Explorer

Minority Depository Institutions

2023

The FDIC maintains a list and tracks the insured MDIs it supervises, i.e., state-chartered institutions that are not members of the Federal Reserve System (Federal Reserve), as well as MDIs that are supervised by the Office of the Comptroller of the Currency (OCC) and the Federal Reserve. The FDIC takes this broad approach given its role in considering applications for deposit insurance and in resolving institutions in the event an MDI were to fail, regardless of the institution’s charter. The FDIC’s published list of FDIC-insured minority depository institutions does not include women-owned or women-managed institutions because they are not included in the statutory definition.

Local Wealth Explorer

Low Income Energy Affordability Data

2022

The Department of Energy publishes energy burden data for low-income households (Defined as (0-80%) of Area Median Income).

Local Wealth Explorer

Hospital Locations

2022

The Geospatial Management Office, which is part of the Department of Homeland Security, collects information including location details on hospitals as part of its Homeland Infrastructure Foundation-Level Data dataset.

Local Wealth Explorer

Hospital Locations

2022

The Geospatial Management Office, which is part of the Department of Homeland Security, collects information including location details on hospitals as part of its Homeland Infrastructure Foundation-Level Data dataset.

Local Wealth Explorer

Opportunity Zones

2018

The Treasury Department designated nearly 9,000 Census tracts as communities in economic distress. Entities that made economic investments in these Opportunity Zones were entitled to certain tax benefits in the period shortly after the designation. Opportunity Zone designations are integrated into the Business Investment Need Index.

Local Wealth Explorer

Redlining Scores

2020

The Inter-university Consortium for Political and Social Research, located at the University of Michigan, maintains historical 'redlining' scores for U.S. geographies based on maps from the now-defunct Home Owners’ Loan Corporation, which quantified mortgage investment risk across the U.S. Redlining refers to the systemic denial of credit to prospective home buyers based on racial and ethnic factors. 2020 U.S. Census tracts are scored by determining the area contained within a historical HOLC boundary and assigning weighted values based on the grade (1 for “A” grade, 2 for “B” grade, 3 for “C” grade, and 4 for “D” grade) and amount of space contained within the HOLC boundary. Redlining scores are used in the Business Investment Need Index.

Local Wealth Explorer

Walkability Index

2021

The Environmental Protection Agency (EPA) created a metric to score U.S. Census block groups based on how feasible travel is within the geography without a private vehicle. Block groups are ranked by a combination of the density of intersections, proximity to public transit stops and diversity of land uses. This last measure looks at the mix of types of zoned use, including residential, industrial and commercial, and the mix of types of employment, with a higher variety of uses leading to a higher score. Walkability index scores are integrated into the Business Investment Need Index.

Local Wealth Explorer

County Business Patterns

2020

The U.S. Census Bureau publishes a survey called County Business Patterns outlining the number of business entities, employment information and payroll figures by geography. Business counts from the County Business Patterns survey are integrated into the Business Investment Need Index.

Local Wealth Explorer

Citing BWDC

To cite the Black Wealth Data Center, please refer to the CC-BY-NC-ND license and use the below format:

Publisher. Date accessed, Article/Page Title, Website Title, URL

Example:

The Black Wealth Data Center. (2022). Explore Data - Assets & Debt. “Average Assets for Individuals, by Race/Ethnicity (Statewide, 2018)”. Blackwealthdata.org. https://blackwealthdata.org/explore/assets

Glossary & Definitions

Data Dictionary

Our data dictionary includes any variables and datasets used on the BWDC platform. We pull variable definitions directly from the source dataset - for example, from the ACS data dictionary.



Data Terms

Below are concepts related to wealth accumulation that are found throughout the BWDC platform.


TERM

DESCRIPTION

Average

The central value of a distribution, found by adding all values and dividing by the number of observations

Median

The exact middle value in a distribution, often described as the "typical" value. This statistic is often used to describe income and wealth because averages can be skewed by extremely high or low values

Parity

Equality or having the same value. For instance there would be parity in Black-White wealth if median household wealth for Black and White households were equal

Quartile

A portion of the distribution of data divided into fourths (1/4 or 25 percent)

DIG DEEPER

Want to explore more local indicators of Black wealth accumulation? Check out our Explore Data pages.

Example data map of the United States