1. Data Sources
CapitolExposed aggregates data from the following official government sources and third-party databases:
| Source | Agency | Data Type | Update Frequency | Coverage |
|---|---|---|---|---|
| House PTR Filings | Clerk of the House | STOCK Act disclosures | Polled every 15 minutes | 2012 – present |
| Senate EFD Search | Secretary of the Senate | Senate periodic transaction reports | Polled every 15 minutes | 2012 – present |
| FEC Data | Federal Election Commission | Campaign finance & donations | Daily | 1980 – present |
| Senate LDA | Secretary of the Senate | Lobbying disclosure filings | Quarterly | 1999 – present |
| Congress.gov API | Library of Congress | Votes, bills, committee info | Daily | 1789 – present |
| ICIJ Offshore Leaks | ICIJ | Offshore entities & connections | Periodic releases | Various leak dates |
| OpenSanctions | OpenSanctions | Sanctions & PEP data | Daily | Global |
| FinCEN | Dept. of the Treasury | Financial crimes data | Periodic releases | 2000 – present |
| SEC EDGAR | SEC | Company filings, SIC codes, insider trades | Daily | 1993 – present |
| WA Public Disclosure | WA State PDC | State official financial disclosures | As filed | WA State |
2. Conflict Scoring Model
The conflict score is a composite metric that quantifies the statistical overlap between a member of Congress's financial activities and their legislative duties. Scores range from 0 (no detected overlap) to 100 (maximum overlap across all dimensions).
2.1 Component Breakdown
Committee Overlap
30%Measures whether a member trades securities of companies that fall under the jurisdiction of their assigned committees. For example, a member of the Senate Banking Committee trading financial sector stocks would increase this component.
How it works:
Each trade is cross-referenced against the member's committee assignments. Company sector (via SEC SIC codes) is matched to committee jurisdictions using a curated mapping table. The component score is the percentage of trades that fall within committee jurisdiction, weighted by trade size.
Timing Proximity
25%Evaluates the temporal proximity of trades to relevant legislative events, including committee hearings, floor votes, and classified briefings involving related industries.
How it works:
For each trade, we calculate the number of days between the transaction date and the nearest relevant legislative event. Trades within 7 days receive maximum weight; 8–14 days receive partial weight; trades more than 30 days from any relevant event receive no timing score.
Lobbying Connections
20%Identifies connections between a member's trading activity and lobbying filings. This component looks at whether companies traded by a member (or their parent companies) have active lobbying registrations.
How it works:
Each traded company is matched against Senate LDA lobbying filings by registrant and client name. The score increases when a member trades securities of companies that are actively lobbying on issues before their committee.
Donation Patterns
15%Examines whether a member receives campaign contributions from industries or companies in which they also trade securities.
How it works:
FEC contribution data is matched to traded companies by employer name and industry classification. The score reflects the percentage of a member's trades in companies whose employees or PACs have contributed to their campaign.
Trade Size
10%Accounts for the relative magnitude of transactions. Larger trades in companies with committee overlap or lobbying connections contribute more to the overall score.
How it works:
STOCK Act filings report amounts in ranges (e.g., $1,001–$15,000, $15,001–$50,000). We use the midpoint of each range. Trades above $250,000 receive maximum weight. This component acts as a multiplier on other signals.
2.2 Score Interpretation
| Score Range | Label | Interpretation |
|---|---|---|
| 0 – 20 | Low | Minimal overlap between financial activity and legislative duties |
| 21 – 40 | Moderate | Some overlap detected; may warrant closer examination |
| 41 – 60 | Elevated | Notable patterns across multiple scoring dimensions |
| 61 – 80 | High | Significant statistical overlap across most dimensions |
| 81 – 100 | Critical | Maximum overlap detected; strong statistical correlations present |
Important
A high conflict score does not indicate wrongdoing, insider trading, or any legal violation. Scores reflect statistical patterns only. Many legitimate factors can produce elevated scores, such as a member trading in large-cap stocks from a sector within their committee's jurisdiction.
2.3 Conflict Flags
In addition to the composite score, individual trades may be flagged when specific criteria are met. Flags include: "committee-overlap" (trade in a company under committee jurisdiction), "timing-suspicious" (trade within 7 days of a relevant event), "large-trade" (amount exceeds $250,000), and "lobbying-connected" (company has active lobbying filings). Flags are informational tags, not conclusions.
3. Cross-Reference Methodology
CapitolExposed cross-references Congressional financial data against several external databases to surface additional connections and patterns.
3.1 ICIJ Offshore Leaks
We match member names and known associates against the ICIJ Offshore Leaks database (Panama Papers, Paradise Papers, Pandora Papers). Matching is performed using normalized name comparison with fuzzy matching (Levenshtein distance threshold of 2 characters). All matches are human-reviewed before publication.
3.2 FARA Registrations
Foreign Agent Registration Act (FARA) filings are cross-referenced to identify lobbyists and firms that represent foreign principals and also have connections to members of Congress through lobbying or donations.
3.3 FinCEN Data
FinCEN files and suspicious activity reports are cross-referenced by entity name and associated financial institutions. Due to the sensitivity of this data, matches require high-confidence name matching (exact match or fuzzy match with manual verification).
3.4 OpenSanctions
We cross-reference company names and individual names against the OpenSanctions database to identify any connections to sanctioned entities or politically exposed persons (PEPs).
3.5 Confidence Thresholds and False Positives
Cross-reference matches are categorized by confidence level:
- • High confidence (90%+): Exact name match plus at least one corroborating identifier (date of birth, address, or associated entity)
- • Medium confidence (70–89%): Fuzzy name match with partial corroboration. Displayed with a confidence indicator.
- • Low confidence (<70%): Not displayed by default. Available in the API with explicit filtering.
Name matching limitations include common names producing false positives, name transliterations, and name changes over time. We acknowledge these limitations and encourage users to verify matches independently.
4. PTR PDF Processing
House Periodic Transaction Reports (PTRs) are filed as PDF documents with the Clerk of the House. CapitolExposed processes these filings through the following pipeline:
Download
PTR PDFs are downloaded from the House Clerk's website via automated polling every 15 minutes.
Text Extraction
PDF text is extracted using PyMuPDF. For scanned documents without embedded text, OCR is applied where possible.
Parsing
Extracted text is parsed to identify member name, transaction type (purchase/sale/exchange), asset description, transaction date, disclosure date, amount range, and owner (self, spouse, dependent, joint).
Ticker Resolution
Asset descriptions are matched to stock tickers using our company database. Ambiguous descriptions (e.g., "ABC Corp" matching multiple companies) are flagged for manual review.
Validation
Parsed data is validated against expected formats and ranges. Trades that fail validation are quarantined and reviewed manually before ingestion.
Error handling for ambiguous data includes: unknown tickers are stored with the raw asset description; dates that cannot be parsed are set to the disclosure date with a flag; amount ranges that don't match standard STOCK Act ranges are preserved as-reported.
5. Data Quality and Limitations
We are transparent about the known limitations and gaps in our data:
5.1 Known Data Gaps
- • Pending OCR trades: Some older House PTR filings are scanned images that resist automated text extraction. These are queued for manual processing.
- • Delisted tickers: Companies that have been delisted, merged, or acquired may not resolve to current market data. Historical trades referencing these companies are preserved but may lack current stock price data.
- • Missing sector classifications: Approximately 38% of tracked companies currently lack sector classification via SEC EDGAR SIC codes. Sector-based analysis is limited for these companies.
- • Reporting delays: Members have 30–45 days to file STOCK Act disclosures after a transaction. The data we present may lag actual trades by weeks.
- • OpenSecrets data: The OpenSecrets (Center for Responsive Politics) API was discontinued in April 2025. Historical data from this source will not be updated.
5.2 Source Data Errors
Errors in source data are handled as follows: when we identify a confirmed error in a government filing (e.g., a typo in a ticker symbol), we annotate the record with both the original filing data and the corrected data. We do not silently modify source data. All corrections are timestamped and attributed.
5.3 What We Cannot Verify
- • Whether a member had material non-public information at the time of a trade
- • The intent or motivation behind any trading decision
- • Whether a trade was executed by the member personally, a financial advisor, or a blind trust manager
- • The accuracy of self-reported filing data submitted by members to the House Clerk or Senate
6. Error Correction
We take data accuracy seriously and provide a clear process for reporting and correcting errors:
6.1 How to Report Errors
- • Email: hello@capitolexposed.com with subject "Data Correction"
- • Contact form: capitolexposed.com/contact
- • Include: the specific URL, the data in question, and the source documentation that demonstrates the error
6.2 Response Timeline
- • Acknowledgment: Within 2 business days
- • Investigation: Within 5 business days
- • Correction (if confirmed): Within 7 business days
6.3 Correction Process
Confirmed errors are corrected in place with an annotation noting the original value, the corrected value, the date of correction, and the source of the correction. Material corrections to investigative content are noted at the top of the relevant article or dossier.
For the standalone public policy, see the corrections policy. For publication standards, see the editorial policy.
7. Open Source
CapitolExposed is partially open source. The following components are publicly available:
- • Frontend application: Next.js application source code, React components, and page layouts
- • Data ingestion scripts: Scripts used to fetch and process government data
- • API route handlers: Public API endpoint implementations
The following are proprietary and not included in the open source release:
- • AI agent prompts and fine-tuning configurations
- • Production database credentials and environment variables
- • Internal moderation and review tools
GitHub Repository