Our Approach to Data
Our approach to data is grounded in applying an equity-based lens as we tell the story of the Black wealth experience. How we collect, analyze, and share data and insights can have serious implications for how the Black wealth experience is understood. Every part of the BWDC platform - from the standards we use to assess the quality of a dataset, to how we present topical insights and exploratory dashboards - is designed to most truthfully and responsibly represent the people behind data points.
Here are a few things we keep in mind when building the BWDC platform with an equity-based lens:
-
Telling a comprehensive story. There isn’t any one indicator that can fully explain Black wealth. We utilize datasets across a variety of topics (homeownership, debt, education, COVID-19, and many more), so that users have access to a comprehensive story of the Black wealth experience.
-
Designing to empower users. Even the most accurate dataset can be misrepresented. We’re intentional about the content and design of each visualization so that users are best equipped to interpret Black wealth data. Many BWDC visualizations are inspired by the work of W.E.B. Du Bois.
-
Accountability through transparency. We try to make our choices as transparent as possible to users: this includes everything from what datasets we’re using, how-to guides for interpreting visualizations, to this methodology page. Transparency is how we stay accountable to users visiting the platform and to the people represented in BWDC data.
Ultimately, our approach is an on-going conversation among all users in the BWDC platform community, because together we can innovate a more equitable approach to telling the Black wealth story.
Data Methodology
Our data methodology is a hybrid of best practices from the fields of open data, civic tech, and data science, in addition to emerging frameworks around responsible and transparent use of big data. As we open the BWDC data repository to the public in the future, we aim to share the outputs from our data quality framework for each data source described in this section.
Assessing the Value of Datasets
Before adding data to the BWDC platform, we assess the value of new datasets by using the
Datasheets for Datasets framework (Gebru et al. 2021)¹. This framework was developed with extensive feedback from researchers, practitioners, and policy makers to increase transparency and accountability around how datasets are created and maintained, and how they should be appropriately used.
Dataset Characteristic
Description
Motivation
Document how dataset creators articulate their reasons for creating the dataset and funding interests
Composition
Provide dataset consumers with the information they need to make informed decisions about using the dataset for their chosen tasks
Collection Process
Flag potential issues with data collection
Preprocessing / Cleaning / Labeling
Provide dataset consumers with the information they need to determine whether the “raw” data has been processed in ways that are compatible with their chosen tasks
Uses
Reflect on the tasks for which the dataset should and should not be used
Distribution
Document any distribution, licensing, privacy, or regulatory restrictions
Maintenance
Capture how the dataset will be maintained by dataset consumers
¹ Datasheets for Datasets - Microsoft Research
Monitoring Data Quality
We monitor the data that powers the BWDC platform through a robust set of QA/QC checks. When applicable, we utilize high-quality, publicly available datasets that are already widely used by researchers and practitioners, such as the Google Cloud Public Dataset Program.
Many data quality frameworks are oriented towards dataset creators - the people that are collecting data by surveying populations in the field, tracking user behavior, etc. Given that BWDC is not generating new datasets but rather aggregating and sharing datasets, we draw on a peer-reviewed framework that captures common data quality indicators from the perspective of dataset users rather than creators.
As we encounter new challenges with data quality and allow users to directly access the BWDC data repository in the future, we may add more indicators here to ensure we are always delivering a high-quality product to BWDC users.²
Adaptation of the Big Data Quality Framework³
Dimension
Elements
Description
Availability
Timeliness
Timeliness is defined as the time delay from data generation and acquisition to utilization⁵. The most recent version of a dataset should be made available for analysis within a reasonable timeframe.
Accessibility
Accessibility refers to the difficulty level for users to obtain and analyze data, and is closely linked with data openness. Open data standards⁴ are designed to ensure all users can access and maximize the use of open data.
Usability
MetaData
With the increase of data sources and data types, it’s easy for data consumers to misconstrue the meaning of common terminology and concepts of data. Data producers need to provide metadata describing different aspects of the datasets to reduce the problems caused by misunderstandings or inconsistencies.
Definition & Documentation
Definition & documentation consists of normative documentation for dataset names, definitions, ranges of valid values, standard formats, business rules, etc.
Reliability
Accuracy
To ascertain the accuracy of a given data value, it is compared to a known reference value. In some cases, there is no known reference value, making it difficult to measure accuracy. Because accuracy is correlated with context to some extent, data accuracy is considered in context.
Integrity
In information security, data integrity means maintaining and assuring the accuracy and consistency of data over its entire life-cycle. This means that data cannot be modified in an unauthorized or undetected manner.
Consistency
Data consistency refers to whether the logical relationship between correlated data is correct and complete. In the field of databases⁶, it usually means that the same data that are located in different storage areas should be considered to be equivalent. Equivalency means that the data have equal value and the same meaning or are essentially the same.
Completeness
Completeness means that the values of all components of a single datum are valid. For example, many population datasets do not adequately represent ethnic and racial minority groups or local and rural geographies due to sampling and nonresponse bias. If certain subgroups are missing, the data can produce inaccurate and inconsistent estimates, and ultimately false conclusions.
² The Challenges of Data Quality and Data Quality Assessment in the Big Data Era (codata.org)
³ Not all dimensions and elements that were published are included here. Additionally, we have tailored the element descriptions to the BWDC use case, which may evolve over time.
⁴ Best Practices — U.S. Open Data Toolkit (usopendatatoolkit.org)
⁵ (McGivray, 2010)
⁶ (Silberschatz, Korth, & Sudarshan, 2006)
Data Sources
Datasets
Publication Year(s) used (up to most current available unless otherwise noted)
Description (as provided by the dataset publisher)
Explore Data page where this data is accessible
Survey of Consumer Finances (SCF)
2007, 2010, 2013, 2016, 2019,2022
The SCF is a triennial cross-sectional survey of U.S. families. The survey data include information on families’ balance sheets, pensions, income, and demographic characteristics. Information is also included from related surveys of pension providers and the earlier such surveys conducted by the Federal Reserve Board.
Assets & Debt
Survey of Program Participation (SIPP)
2021
SIPP is a nationally representative longitudinal survey that provides comprehensive information on the dynamics of income, employment, household composition, and government program participation.
Assets & Debt
Debt to Income Ratio, US Federal Reserve and Equifax Credit Panel
2007-2022
Household debt is calculated from FRBNY/Equifax credit panel and household income is reported from the Bureau of Labor Statistics.
Assets & Debt
Survey of Unbanked and Underbanked Households, FDIC
2017-2021
The Survey of Unbanked and Underbanked Households collects information on bank account ownership; use of prepaid cards and nonbank online payment services; use of nonbank money orders, check cashing, and money transfer services; and use of bank and nonbank credit.
Assets & Debt
American Community Survey (ACS)
2016 5-year estimates, 2020 5-year estimates, 2021 5-year estimates, 2022 5 year estimates
The ACS is the flagship product of the US Census and the premier resource for detailed population and housing data within the United States nationally and down to a hyper-local level. Unemployment rate figures are used in the Business Investment Need Index.
Assets & Debt, Education, Employment, Business Ownership, Homeownership, Population, Local Wealth Explorer
Annual Business Survey (ABS)
2021
The ABS Program combines data results from survey respondents and administrative records to produce data on business ownership. The survey is collected from employer businesses and the nonemployer data are compiled from administrative records.
Business Ownership
Small Business Administration (SBA) 7(a)
2011-2021
The SBA releases FOIA data for 7(a) loans,the SBA’s most popular program for financial assistance to businesses
Business Ownership
Small Business Administration (SBA) 504 Loan Records
2011-2023
The SBA releases FOIA data 504 loans, which focus on promoting business growth and job creation.
Business Ownership
Small Business Administration (SBA) Paycheck Protection Program (PPP) Loan Records
2020-2021
The SBA releases administrative data on the federal paycheck protection program, including: Loan characteristics (e.g. amount); Business characteristics (e.g. business type); Borrower characteristics (e.g. address).
Business Ownership
National Center for Education Statistics (NCES)
Select years between 2010 and 2023
The NCES provides a compilation of government, private data sources, and surveys on education statistics, including: School and college characteristics (e.g. number of enrollments); Educational attainment (e.g. graduation rate); Finances (e.g. federal funds).
Education, Employment
High Speed Internet Usage, Microsoft Airband Initiative
2020
Estimated Broadband Usage by combining data from several Microsoft products
Education
Bureau of Labor Statistics Current Population Survey (CPS)
2013 - 2021
The CPS is a monthly survey of households conducted by the Bureau of Labor Statistics that is the primary source for U.S. labor force statistics, including: Employment (e.g. work status, industry); Demographic characteristics (e.g. age); Supplemental topics (e.g. child support, health insurance coverage, COVID-19).
Employment
Unemployment Insurance Weekly Claims Data
2020 - partial 2024
The Unemployment Insurance Weekly Claims dataset from the U.S. Department of Labor provides weekly unemployment insurance claims, including the number of initial claims and continued weeks of claimed unemployment benefits
Employment
U.S. Inflation and Unemployment Data
2022
The Geographic Profile of Employment and Unemployment presents annual data on the employed and unemployed for states and census regions and divisions. The estimates include demographic characteristics, occupation, industry, class of worker, hours of work, duration of unemployment, and reason for unemployment. The source of the data is the Current Population Survey.
Employment
National Level Poverty Rates
2020-2024
Center on Poverty andd Social Policy, Columbia University
Assets and Debt
Candid data on non-profit demographics
2011-2023
The SBA releases FOIA data 504 loans, which focus on promoting business growth and job creation.
Employment
Local Area Unemployment Statistics
2024
The Local Area Unemployment Statistics (LAUS) program from the Bureau of Labor Statistics produces monthly and annual employment, unemployment, and labor force data for Census regions and divisions, States, counties, metropolitan areas, and many cities, by place of residence
Employment
Longitudinal Employer-Household Dynamics, Origin-Destination Employent Statistics (LODES)
2021
The Longitudinal Employer-Household Dynamics (LEHD) program is part of the Center for Economic Studies at the U.S. Census Bureau. The LEHD program produces cost effective, public-use information combining federal, state and Census Bureau data on employers and employees under the Local Employment Dynamics (LED) Partnership. State and local authorities increasingly need detailed local information about their economies to make informed decisions
Employment
Business Formation Statistics (BFS)
2011-2023
The BFS are a standard data product of the U.S. Census Bureau developed in research collaboration with economists affiliated with the Board of Governors of the Federal Reserve System, Federal Reserve Bank of Atlanta, University of Maryland, and University of Notre Dame. The BFS provides timely and high frequency information on new business applications and formations in the United States.
Business Ownership
Quarterly Census of Employment and Wages (QCEW)
2023
The Quarterly Census of Employment and Wages (QCEW) program publishes a quarterly count of employment and wages reported by employers covering more than 95 percent of U.S. jobs, available at the county, MSA, state and national levels by industry.
Employment
Survey of Household Economics and Decisionmaking (SHED)
2019, 2023
Since 2013, the Federal Reserve Board has conducted the Survey of Household Economics and Decisionmaking (SHED), which measures the economic well-being of U.S. households and identifies potential risks to their finance
Assets and Debt
National Level Poverty Rates
2020, 2024
Since 2013, the Federal Reserve Board has conducted the Survey of Household Economics and Decisionmaking (SHED), which measures the economic well-being of U.S. households and identifies potential risks to their finance
Assets and Debt
National Risk Index for Natural Hazards (NRI)
2023
The National Risk Index published by FEMA shows which communities are most at risk to natural hazards. It includes data about the expected annual losses to individual natural hazards, social vulnerability and community resilience.
Black Wealth Indicators
FDIC Active Institutions
2023
This dataset, published by the FDIC, lists FDIC-insured banks and branches.
Black Wealth Indicators
Home Mortgage Disclosure Act (HMDA)
2017-2022
The Consumer Financial Protection Bureau releases data based on the Home Mortgage Disclosure Act (HMDA), which requires many financial institutions to maintain, report, and publicly disclose loan-level information about mortgages.
Homeownership
Housing Vacancies and Homeownership (HVS)
2016, 2017, 2018, 2019, 2020, 2021,2022,2023
The HVS dataset is a dataset that is published by the US Census on a quarterly basis and provides information on rental and homeowner vacancy rates.
Homeownership
CDC/ATSDR Social Vulnerability Index
2022
The Social Vulnerability Index (SVI) employs U.S. Census Bureau variables to help users identify communities that may need support in preparing for hazards or recovering from disasters.
Homeownership
The New York Times US Coronavirus Database
2020 - partial 2023
This is the US Coronavirus data repository from The New York Times . This data includes COVID-19 cases and deaths reported by state and county. The New York Times compiled this data based on reports from state and local health agencies.
Population
Yelp Business Data
2024
Yelp data are ingested via the platform's API. Businesses self identify as being Black owned by adding the Black-owned Business attribute to their Yelp business page.
Local Wealth Explorer
PurpleAir Air Quality Data
2024
PurpleAir is a vendor of low-cost air quality sensors designed to sample air for pollution. The Black Wealth Data Center uses PurpleAir data by taking ALT cf=3 sensor data for PM2.5 particle pollution and applying the air quality index calculation formula provided by the Environmental Protection Agency.
Local Wealth Explorer
Environmental Protection Agency AirNow
2024
EPA AirNow data show air quality readings recorded at various intervals of time. Air quality index is a calculation based on the concentration of PM2.5 particles in the sampled air.
Local Wealth Explorer
U.S Parks Data
2024
The Trust for Public Land makes location, geography and access information available for roughly 145,000 U.S. parks.
Local Wealth Explorer
School locations
2022
The National Center for Education Statistics collects information including location details on public elementary and secondary schools, including charter schools, as part of its Common Core of Data dataset.
Local Wealth Explorer
Minority Depository Institutions
2023
The FDIC maintains a list and tracks the insured MDIs it supervises, i.e., state-chartered institutions that are not members of the Federal Reserve System (Federal Reserve), as well as MDIs that are supervised by the Office of the Comptroller of the Currency (OCC) and the Federal Reserve. The FDIC takes this broad approach given its role in considering applications for deposit insurance and in resolving institutions in the event an MDI were to fail, regardless of the institution’s charter. The FDIC’s published list of FDIC-insured minority depository institutions does not include women-owned or women-managed institutions because they are not included in the statutory definition.
Local Wealth Explorer
Low Income Energy Affordability Data
2022
The Department of Energy publishes energy burden data for low-income households (Defined as (0-80%) of Area Median Income).
Local Wealth Explorer
Hospital Locations
2022
The Geospatial Management Office, which is part of the Department of Homeland Security, collects information including location details on hospitals as part of its Homeland Infrastructure Foundation-Level Data dataset.
Local Wealth Explorer
Hospital Locations
2022
The Geospatial Management Office, which is part of the Department of Homeland Security, collects information including location details on hospitals as part of its Homeland Infrastructure Foundation-Level Data dataset.
Local Wealth Explorer
Opportunity Zones
2018
The Treasury Department designated nearly 9,000 Census tracts as communities in economic distress. Entities that made economic investments in these Opportunity Zones were entitled to certain tax benefits in the period shortly after the designation. Opportunity Zone designations are integrated into the Business Investment Need Index.
Local Wealth Explorer
Redlining Scores
2020
The Inter-university Consortium for Political and Social Research, located at the University of Michigan, maintains historical 'redlining' scores for U.S. geographies based on maps from the now-defunct Home Owners’ Loan Corporation, which quantified mortgage investment risk across the U.S. Redlining refers to the systemic denial of credit to prospective home buyers based on racial and ethnic factors. 2020 U.S. Census tracts are scored by determining the area contained within a historical HOLC boundary and assigning weighted values based on the grade (1 for “A” grade, 2 for “B” grade, 3 for “C” grade, and 4 for “D” grade) and amount of space contained within the HOLC boundary. Redlining scores are used in the Business Investment Need Index.
Local Wealth Explorer
Walkability Index
2021
The Environmental Protection Agency (EPA) created a metric to score U.S. Census block groups based on how feasible travel is within the geography without a private vehicle. Block groups are ranked by a combination of the density of intersections, proximity to public transit stops and diversity of land uses. This last measure looks at the mix of types of zoned use, including residential, industrial and commercial, and the mix of types of employment, with a higher variety of uses leading to a higher score. Walkability index scores are integrated into the Business Investment Need Index.
Local Wealth Explorer
County Business Patterns
2020
The U.S. Census Bureau publishes a survey called County Business Patterns outlining the number of business entities, employment information and payroll figures by geography. Business counts from the County Business Patterns survey are integrated into the Business Investment Need Index.
Local Wealth Explorer
Citing BWDC
To cite the Black Wealth Data Center, please refer to the CC-BY-NC-ND license and use the below format:
Publisher. Date accessed, Article/Page Title, Website Title, URL
Example:
The Black Wealth Data Center. (2022). Explore Data - Assets & Debt. “Average Assets for Individuals, by Race/Ethnicity (Statewide, 2018)”. Blackwealthdata.org. https://blackwealthdata.org/explore/assets
Glossary & Definitions
Data Dictionary
Our data dictionary includes any variables and datasets used on the BWDC platform. We pull variable definitions directly from the source dataset - for example, from the ACS data dictionary.
Data Terms
Below are concepts related to wealth accumulation that are found throughout the BWDC platform.
TERM
DESCRIPTION
Average
The central value of a distribution, found by adding all values and dividing by the number of observations
Median
The exact middle value in a distribution, often described as the "typical" value. This statistic is often used to describe income and wealth because averages can be skewed by extremely high or low values
Parity
Equality or having the same value. For instance there would be parity in Black-White wealth if median household wealth for Black and White households were equal
Quartile
A portion of the distribution of data divided into fourths (1/4 or 25 percent)
DIG DEEPER
Want to explore more local indicators of Black wealth accumulation? Check out our Explore Data pages.