add Phase 1 security hardening, mapping confidence, audit logging, pilot docs

- lock CORS to Vercel domain via ALLOWED_ORIGINS env var (removes allow_origins=*) - add X-API-Key header auth on /api/upload and /api/export - normalizer: add mapping confidence (high/inferred), new aliases for Acct #, Member ID, External Patient Ref, DME Description, dispensedate; 63/63 CSV files pass - coverage_calculator: add RULE_VERSION = "v0.1", rule_version on every CoverageResult - main.py: audit logging wired on upload + export, rule_version + mapping_summary in response - generate_samples.py: 25 CSV files now use 25 different real-world header formats - add generate_10k.py for 10,000-patient synthetic dataset - add tests/smoke_test.py (passes against local backend) - add docs/pilot-guide-v1.md for Robert Robinson pilot onboarding - add docs/daniel-pilot-readiness-whitepaper.md and .pdf Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 05:41:25 -04:00 · 2026-05-29 05:41:25 -04:00 · cf171a3f87
commit cf171a3f87
parent c2141a127a
13 changed files with 1216 additions and 39 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -57,6 +57,7 @@ Two-curve graph showing supplier staff time over months:
 | Item | Phase | Reason deferred |
 |------|-------|----------------|
 | AWS infrastructure (any service) | Post-pilot | Jakub (coach, 2026-05-28): mock data is sufficient for demos and pilots. AWS takes 2 hours to set up — do it right before signing live PHI contracts. Do not start during MVP or pilot phase. |
 | Dexcom OAuth API integration | Phase 2 | Requires vendor agreement + PHI scope expansion |
 | Billing system API integration (Brightree, Bonafide, Fastrack, etc.) | Phase 2 | Build after pilot reveals which systems suppliers use; vendor agreements required; PHI scope expands significantly |
 | CMS and contact management integration (Salesforce, etc.) | Phase 2 | Pilot feedback will reveal if suppliers have CMS and what they use; do not assume |
--- a/docs/daniel-pilot-readiness-whitepaper.md
+++ b/docs/daniel-pilot-readiness-whitepaper.md
@ -0,0 +1,522 @@
 # StillSolutions Pilot Readiness Whitepaper
 Backend and frontend work required before a supplier-facing pilot
 Prepared for: Kisa and Daniel Wright  
 Meeting date: May 18, 2026  
 Source basis: Meeting transcript provided in the request, with current platform and healthcare privacy references checked where relevant.
 ## Executive Summary
 StillSolutions is targeting a real operational pain in the DME POS supplier workflow: suppliers submit claims or order packets, discover too late that required documentation is missing, and then carry denial risk after inventory and service costs have already been incurred. The product direction discussed in the meeting is sound: take a supplier CSV, calculate risk and urgency, then produce a stoplight-style work queue showing which patients, prescribers, payers, or documentation gaps need attention first.
 The current product is not ready for an external pilot yet. The visible UI is directionally useful and should not be heavily redesigned. The main issue is that the product still behaves like a local demo built around ideal sample data, not a hosted workflow that can survive messy supplier inputs, data privacy expectations, export requirements, and a non-technical pilot participant.
 The next phase should focus on three things:
 1. Build a reliable backend around CSV ingestion, normalization, report generation, persistence, and exports.
 2. Turn the frontend into a guided pilot workflow: upload, map, validate, review, export, and share.
 3. Put the app online with basic authentication, environment management, automated browser tests, and a repeatable development workflow using project-specific skills.
 The most important nuance from the meeting is the CSV problem. A clean sample CSV is not enough. Every supplier will export data differently: headers vary, columns move, dates change format, payer names drift, patient IDs may be named differently, and documentation fields may be incomplete. If the product assumes a perfect CSV, the pilot may work once and fail with the next supplier. The immediate engineering goal is therefore not more UI polish. It is making the importer and backend tolerant enough to understand 25 or more realistic CSV variants that mean the same thing but look different.
 ## Current Product State
 The meeting showed a dashboard-like interface with a stoplight report. The report appears to show patient or order records, urgency, status, priority, and recommended next actions. There is an import CSV action on the left and an export CSV action on the report page. Sample data could be imported, although it was not fully clear whether the UI refreshed reliably after import. The export action appeared not to work, and its intended downstream use was not yet fully defined.
 The current UI has useful bones:
 - The stoplight model is easy to understand.
 - The report layout is simple enough for a supplier to scan.
 - The product value is visible once sample data is loaded.
 - Light mode / dark mode is not the key issue.
 - The existing interface should mostly be retained for the first pilot.
 But the current state also exposes pilot blockers:
 - The product was not immediately ready to show on the call.
 - The app appears to be run locally rather than hosted.
 - There is no confirmed end-to-end flow from upload to validated report to export.
 - The import flow relies too heavily on known sample data.
 - The export flow does not appear wired.
 - There is no clear supplier-system crosswalk yet.
 - The UI contains placeholder content, including the bottom-left name reference, that should be removed before a pilot.
 - There is no visible onboarding or data requirement screen for pilot users.
 The conclusion is direct: StillSolutions has enough product shape to justify a pilot, but not enough operational reliability to put a supplier through the pilot without first hardening the backend, data workflow, and testing loop.
 ## Product Objective
 StillSolutions should become a hosted web application that helps DME POS suppliers identify documentation and authorization risk before revenue is lost.
 The pilot version should support this simple promise:
 A supplier uploads an order or claims-related CSV. StillSolutions normalizes the file, identifies missing or time-sensitive documentation, calculates urgency, and returns an actionable report showing which records need outreach first and why.
 The pilot should not try to become a full practice management system. It should focus on one narrow workflow:
 1. Upload supplier data.
 2. Map and validate the file.
 3. Calculate documentation risk.
 4. Produce a stoplight report.
 5. Export a work queue that can be used in the supplier's existing system.
 That is enough to test whether StillSolutions saves time, reduces denial exposure, and helps suppliers act before deadlines are missed.
 ## Target Architecture
 The meeting recommendation was to avoid packaging a desktop app. That is the right call. A desktop application creates avoidable packaging and support overhead across Windows, macOS, and Linux. A web app is the better path for a pilot because it gives one hosted surface, easier iteration, one login model, and one place to apply security controls.
 The recommended pilot stack is:
 - Frontend: Vercel or equivalent frontend hosting for the web UI.
 - Backend: Railway or equivalent app hosting for ingestion, normalization, rules, and export services.
 - Database and file storage: Supabase or equivalent managed Postgres plus controlled object storage.
 - Authentication: Clerk or equivalent auth provider when the pilot moves beyond a controlled demo.
 This stack should not be treated as permanent enterprise architecture. It is a pragmatic pilot stack: fast to ship, easy to inspect, and compatible with AI-assisted development workflows. Vercel documents support for common frontend frameworks and features such as routing, server-side rendering, static assets, framework support, and deployment surfaces. Railway documents deployment environment variables, Git-triggered metadata, and an MCP server that can be installed for OpenAI Codex and other coding tools. Supabase provides Postgres Row Level Security guidance for browser-exposed data access and storage policies. Clerk Organizations provides a model for roles and permissions if StillSolutions needs supplier-level workspaces later.
 For healthcare data, the stack choice must be validated before accepting real PHI. HHS states that the HIPAA Security Rule is about protecting electronic protected health information using administrative, physical, and technical safeguards. HHS also states that covered entities and business associates using a cloud service provider for ePHI need the appropriate business associate agreement and HIPAA compliance posture. This whitepaper is not legal advice. It means StillSolutions should not accept real production ePHI until security, contractual, and operational controls have been reviewed.
 ## Backend Workstream
 ### 1. Canonical Data Model
 Before improving the importer, define the canonical record that the backend needs. The sample CSV is not the system of record. It is just one possible input format.
 The canonical order or patient-risk record should include, at minimum:
 - Supplier account or organization ID.
 - Upload batch ID.
 - Source file ID.
 - Patient identifier from supplier system.
 - Optional patient display label for pilot/demo use.
 - Device, item, SKU, HCPCS, or product category.
 - Payer or plan.
 - Prescriber or provider reference.
 - Order date, supply generated date, dispense date, or service date.
 - Documentation requirements known or inferred.
 - Prior authorization status.
 - Six-month visit or visit documentation status.
 - Signed written order status.
 - Other payer-specific required documentation status.
 - Days left or days overdue.
 - Computed risk status.
 - Computed priority.
 - Recommended action.
 - Export status.
 - Audit timestamps.
 The canonical model should separate raw uploaded values from normalized values. This matters because users will need to understand what the system changed, inferred, or could not map.
 ### 2. CSV Ingestion and Normalization
 This is the highest-priority backend work.
 The app should be tested against at least 25 mock CSV files. Each file should represent the same business meaning but use different column orders, header labels, date formats, optional columns, and messy values. Examples:
 - `patient_id`, `Patient ID`, `Member ID`, `Acct #`, and `External Patient Ref` should be treated as possible patient identifiers.
 - `payer`, `insurance`, `plan`, and `primary payor` should be mapped to payer.
 - `device`, `item`, `product`, `DME`, `HCPCS`, and `equipment` may all describe the supplied item.
 - Dates may appear as `MM/DD/YYYY`, `YYYY-MM-DD`, text dates, blank dates, or mixed formats.
 - The file may contain extra columns that the app should ignore but preserve in raw data.
 - Required columns may be missing and should trigger a validation screen instead of a silent failure.
 The importer should have a deterministic path first, then an AI-assisted fallback only where needed.
 Recommended flow:
 1. Parse the CSV with a real CSV parser.
 2. Detect headers and encodings.
 3. Map headers to canonical fields using a synonym dictionary.
 4. Validate required fields.
 5. Normalize dates, payer names, device names, and identifiers.
 6. Produce a mapping confidence score.
 7. Ask the user to confirm uncertain mappings in the frontend.
 8. Store both the raw upload and normalized records.
 9. Generate the report from normalized records only.
 AI can help infer ambiguous headers or suggest mappings, but core validation should be deterministic and testable. For pilot trust, the system should show what it inferred.
 ### 3. Rules and Calculation Engine
 The stoplight report needs explicit calculation logic. The UI should not merely display whatever status appeared in the CSV. As discussed in the meeting, status and priority should be calculated from the input facts.
 The first rule engine can be simple:
 - Red: documentation is missing, due now, expired, or close enough to deadline that immediate outreach is required.
 - Yellow: documentation is incomplete or approaching deadline.
 - Green: requirements appear complete or no urgent action is detected.
 - Unknown: required inputs are missing, unmapped, or not trusted enough to calculate.
 Each status should include a reason string. Example: `Red because signed written order is missing and the service date is inside the denial-risk window.`
 The backend should also produce the recommended action:
 - Contact patient.
 - Contact prescriber.
 - Request signed written order.
 - Confirm six-month visit documentation.
 - Verify prior authorization.
 - Review payer-specific documentation.
 - Exclude from automated scoring until required fields are mapped.
 The calculation engine should be versioned. A report generated under rule version `v0.1` must remain explainable later, especially if a supplier challenges the output.
 ### 4. Persistence and Multi-Tenant Data Boundaries
 For a real pilot, uploads, normalized rows, reports, and exports should be stored. A pure in-browser CSV reader is not enough because the product needs history, auditability, and repeatability.
 Minimum backend tables:
 - `organizations`
 - `users`
 - `upload_batches`
 - `source_files`
 - `raw_rows`
 - `normalized_records`
 - `mapping_decisions`
 - `report_runs`
 - `report_items`
 - `export_files`
 - `audit_events`
 If Supabase is used, Row Level Security must be enabled and designed deliberately. Supabase documentation says RLS should be enabled on exposed schemas, and service keys that bypass RLS should never be exposed in the browser. Storage access also needs policies at the bucket and object level. That means StillSolutions should avoid any shortcut where the frontend holds administrative keys or directly accesses cross-tenant data.
 ### 5. Export Service
 The export button needs a clear purpose before the pilot.
 Based on the meeting, the likely export target is a CSV work queue that a supplier can crosswalk into its own practice management system. That export should include:
 - Supplier patient ID.
 - Device or item.
 - Payer.
 - Prescriber reference where available.
 - Status.
 - Priority.
 - Days left or overdue.
 - Missing documentation reason.
 - Recommended action.
 - Notes field.
 - Batch ID and report run ID.
 The export should not invent patient phone numbers or sensitive details unless the supplier provided them and the system is cleared to handle them. For a safer pilot, keep the first export focused on IDs and work actions, allowing the supplier to join it back to its own system.
 The export service should also support templates later:
 - Generic work queue CSV.
 - Prescriber outreach list.
 - Patient outreach list.
 - Payer follow-up list.
 - Exceptions / unmapped records report.
 ### 6. Security, Privacy, and Compliance Readiness
 StillSolutions is operating in a healthcare-adjacent workflow. Even a pilot can involve sensitive data. The safest path is:
 - Use synthetic data for internal testing.
 - Use de-identified or limited data for first external demos where possible.
 - Do not ingest real PHI until authentication, authorization, audit logging, encryption posture, retention rules, deletion rules, vendor agreements, and incident procedures are reviewed.
 - Treat vendor BAA availability as a gating item before real ePHI is stored or processed.
 - Keep admin/service keys off the frontend.
 - Log access and export events.
 - Add a data retention policy for pilot uploads.
 The HHS guidance is the reason this belongs in the backend plan, not as late paperwork. Once real supplier data enters the system, infrastructure and process choices become compliance choices.
 ## Frontend Workstream
 ### 1. Guided Pilot Flow
 The frontend should be reorganized around a five-step flow:
 1. Upload CSV.
 2. Confirm field mapping.
 3. Review validation issues.
 4. Generate report.
 5. Export work queue.
 This can be done without changing the visual character of the existing dashboard. The current report UI is simple and useful. The missing piece is user guidance.
 The upload page should answer:
 - What file is expected?
 - What fields are required?
 - What fields are optional?
 - What happens to the file?
 - Is this demo data, synthetic data, or real supplier data?
 - What should the user do if mappings are uncertain?
 The app should not let a user jump straight from a questionable import to a confident report without seeing validation and mapping status.
 ### 2. Mapping Review Interface
 Because CSV variability is the core risk, the frontend needs a mapping screen. This is where the product can turn a messy supplier file into a trustworthy workflow.
 The mapping screen should show:
 - Original CSV header.
 - Detected canonical field.
 - Confidence level.
 - Example values from the file.
 - Required or optional status.
 - User override dropdown.
 For the first pilot, this does not need to be beautiful. It needs to be clear. A user should be able to see that `Acct #` has been mapped to `Patient ID`, or that `DME Type` has been mapped to `Device`.
 ### 3. Validation and Error States
 The import button currently does not give enough visible confidence. The app needs strong states:
 - Upload in progress.
 - Import succeeded.
 - Import succeeded with warnings.
 - Import failed.
 - Required fields missing.
 - Unknown date format.
 - Duplicate patient IDs.
 - Empty file.
 - Unsupported file type.
 - Rows skipped.
 - Rows needing manual review.
 Every failure should tell the user what to do next. A pilot participant should never need to open developer tools or ask the founder what happened.
 ### 4. Report UI
 The report UI should keep the stoplight model but add traceability.
 Each row should include:
 - Status color.
 - Priority.
 - Patient or supplier ID.
 - Device / item.
 - Payer.
 - Date basis used for calculation.
 - Days left or overdue.
 - Missing or risky documentation.
 - Recommended action.
 - Reason code.
 Add filters:
 - Red only.
 - Yellow only.
 - Unknown / unmapped.
 - By payer.
 - By prescriber.
 - By action type.
 Add a detail drawer for each record showing:
 - Raw uploaded values.
 - Normalized values.
 - Calculation reason.
 - Mapping warnings.
 - Export status.
 This is important because a supplier will not trust a red/yellow/green output unless it can see why the system made that call.
 ### 5. Export UI
 The export button should become a short export dialog:
 - Choose export type.
 - Choose included fields.
 - Confirm that IDs are intended for crosswalk into the supplier system.
 - Download CSV.
 - Store export event in audit log.
 At minimum, the first export type should be `Work Queue CSV`.
 ### 6. Placeholder Cleanup
 Remove or hide placeholder identity and demo-specific content before any pilot. The bottom-left `S. Sullivan` style reference discussed in the meeting should not appear in a supplier pilot unless it is explicitly labeled as demo data. Placeholder names create confusion and reduce trust.
 Demo mode should be explicit:
 - `Demo Supplier`
 - `Synthetic Patient 001`
 - `Sample payer`
 - `Sample file generated for testing`
 That is cleaner than accidentally showing fake production-looking identities.
 ## Dev Workflow and Automation
 The meeting exposed a repeatability problem: the app could not be quickly located, booted, and demonstrated. This is not just a personal workflow issue. It is a product-readiness issue because pilot reliability depends on repeatable setup.
 Every stable workflow should become a project skill or script:
 - Start frontend locally.
 - Start backend locally.
 - Seed sample data.
 - Generate 25 test CSV variants.
 - Run importer tests.
 - Run export tests.
 - Run browser pilot smoke test.
 - Deploy preview.
 - Verify production health.
 The ideal smoke test should:
 1. Boot the app.
 2. Open the browser.
 3. Upload a sample CSV.
 4. Confirm mappings.
 5. Generate the stoplight report.
 6. Export the work queue.
 7. Take screenshots.
 8. Fail if expected values are missing.
 This should run before any pilot meeting. The founder should not have to manually rediscover where the app is or how to run it.
 ## Pilot Readiness Checklist
 StillSolutions should not schedule an external pilot user until these items are complete:
 - App is hosted behind a stable URL.
 - Demo login or controlled access is working.
 - Synthetic sample data loads end to end.
 - At least 25 CSV variants pass ingestion tests.
 - Import flow shows mapping, validation, and warnings.
 - Report generation calculates status and priority from backend logic.
 - Each status has a reason and recommended action.
 - Export CSV works and can be opened cleanly.
 - Placeholder UI content is removed or clearly marked as synthetic.
 - Browser smoke test passes.
 - Data handling rules are documented.
 - Real PHI is blocked until compliance and vendor requirements are reviewed.
 - Pilot success metrics are written down.
 ## Recommended Delivery Plan
 ### Phase 1: Make the Existing Demo Reliable
 Timebox: 2 to 4 focused build sessions.
 Deliverables:
 - Open the project in an IDE and define repeatable start commands.
 - Create a project skill or script to boot the app.
 - Generate 25 mock CSV files.
 - Build importer tests against those files.
 - Fix import refresh behavior.
 - Remove confusing placeholder UI.
 - Wire a basic export CSV.
 Exit criteria:
 - A local browser smoke test can upload sample data, show the report, and export a CSV without manual debugging.
 ### Phase 2: Add Backend Structure
 Timebox: 1 to 2 weeks depending on current codebase shape.
 Deliverables:
 - Backend service for upload, parse, normalize, score, and export.
 - Canonical database schema.
 - Upload batch persistence.
 - Report run persistence.
 - Rule versioning.
 - Audit events.
 - API endpoints used by the frontend.
 Exit criteria:
 - The frontend no longer relies on only local in-browser state for the core report workflow.
 ### Phase 3: Host the Pilot
 Timebox: 2 to 5 days after Phase 2 if the app is already structured cleanly.
 Deliverables:
 - Frontend deployed.
 - Backend deployed.
 - Database provisioned.
 - Environment variables configured.
 - Staging and production separation.
 - Demo account or controlled access.
 - Health checks.
 Exit criteria:
 - A pilot participant can use a URL, upload a permitted pilot file, review output, and export the work queue.
 ### Phase 4: Controlled Supplier Pilot
 Timebox: 2 to 4 weeks of observation.
 Deliverables:
 - Pilot guide.
 - Data intake agreement and rules.
 - Success metrics.
 - Weekly review of false positives, false negatives, and unmapped files.
 - Export usability feedback.
 - Supplier-specific field mapping improvements.
 Exit criteria:
 - StillSolutions can show whether the product reduces manual review time, identifies documentation risk earlier, and produces a work queue suppliers will actually use.
 ## Success Metrics
 The pilot should measure operational usefulness, not just whether the software runs.
 Recommended metrics:
 - Percentage of uploaded rows successfully mapped.
 - Percentage of rows requiring manual mapping correction.
 - Number of red/yellow/green/unknown outputs.
 - Percentage of red/yellow outputs accepted as useful by the supplier.
 - Number of recommended actions completed by staff.
 - Time from upload to usable report.
 - Time saved versus current manual review.
 - Number of records that would have been missed without the tool.
 - Supplier confidence score after report review.
 - Export CSV usability score.
 The product only wins if a supplier says: this tells me who to contact, why, and what to do next.
 ## Risks and Mitigations
 CSV variability is the top technical risk. Mitigation: build the 25-file mock suite immediately and treat every importer bug as a product bug, not a data issue.
 Compliance exposure is the top operational risk. Mitigation: use synthetic data until vendor agreements, auth, audit, retention, and privacy posture are reviewed.
 False confidence is the top product risk. Mitigation: show reason codes, mapping confidence, and unknown states instead of forcing every record into red/yellow/green.
 Export ambiguity is the top workflow risk. Mitigation: define the export as a work queue first, not as a full integration with every supplier system.
 Founder workflow drag is the top execution risk. Mitigation: make skills and scripts for every repeatable action, including booting, testing, demoing, deploying, and generating sample data.
 ## Key Recommendations
 Do not chase an external pilot yet. First, prove the full flow internally with synthetic data.
 Do not redesign the whole UI. Keep the stoplight report and make the workflow around it clearer.
 Do not depend on one sample CSV. Build 25 intentionally different CSVs and force the importer to survive them.
 Do not package a desktop app. Use a hosted web app for speed, control, and lower support burden.
 Do not ingest real PHI until data controls and vendor obligations are reviewed.
 Do make every successful development workflow repeatable through scripts or skills. StillSolutions should never need to rediscover how to boot, test, or demo the product.
 ## References
 - Meeting transcript: Kisa and Daniel Wright, May 18, 2026.
 - HHS, [The HIPAA Security Rule](https://www.hhs.gov/hipaa/for-professionals/security/index.html).
 - HHS, [Business Associates FAQ](https://www.hhs.gov/hipaa/for-professionals/faq/business-associates/index.html).
 - Vercel, [Frontends on Vercel](https://vercel.com/docs/frameworks/frontend).
 - Railway, [Railway MCP documentation](https://docs.railway.com/cli/mcp).
 - Railway, [Variables reference](https://docs.railway.com/variables/reference).
 - Supabase, [Row Level Security](https://supabase.com/docs/guides/database/postgres/row-level-security).
 - Supabase, [Storage Access Control](https://supabase.com/docs/guides/storage/security/access-control).
 - Clerk, [Organization roles and permissions](https://clerk.com/docs/guides/organizations/control-access/roles-and-permissions).
--- a/docs/daniel-pilot-readiness-whitepaper.pdf
+++ b/docs/daniel-pilot-readiness-whitepaper.pdf
--- a/docs/pilot-guide-v1.md
+++ b/docs/pilot-guide-v1.md
@ -0,0 +1,159 @@
 # Signal Pilot Guide
 **STTIL Solutions | Confidential | For Pilot Participants Only**
 ---
 ## What Signal Does
 Signal is a documentation readiness tool for CGM suppliers. You upload a CSV of your patient shipment records, Signal checks each record against documentation requirements, and returns a prioritized worklist showing which patients need attention before supplies go out or claims go in. The goal is to catch gaps before they become denials.
 ---
 ## How the Pilot Works
 The pilot runs on historical data using de-identified patient records. You keep all real patient information in your system. Signal sees only synthetic identifiers you assign.
 The five steps are:
 1. Prepare your data file (replace real patient IDs with synthetic ones)
 2. Upload the file to Signal
 3. Review the worklist Signal returns
 4. Export the work queue to your billing system
 5. Compare Signal's flags to your actual claim outcomes
 ---
 ## Step 1: Prepare Your Data File
 Signal does not need real patient names, Social Security numbers, dates of birth, or contact information. It works from shipment records only.
 Before exporting, your billing staff will:
 **Replace real patient IDs with synthetic ones.** Assign sequential placeholders before the export:
 | Real Patient ID | Synthetic ID to Use |
 |----------------|---------------------|
 | Your internal MRN or account number | P001, P002, P003... |
 Keep your own crosswalk table matching synthetic IDs back to real patient IDs. Signal will return its worklist using the synthetic IDs. You cross-reference back to real patients in your system.
 **Export these fields from your billing system:**
 | Field | What Signal Needs | Example |
 |-------|------------------|---------|
 | Patient identifier | Your synthetic ID | P001 |
 | Device type | CGM device name | Dexcom G7, Libre 3, G6 |
 | Shipment date | Date supplies went out | 03/15/2026 |
 | Quantity | Units shipped | 3 |
 | Payer | Insurance or plan name | Medicare Part B, Aetna |
 Signal accepts common column name variations. If your export calls the shipment date "Service Date" or "DOS," Signal will recognize it.
 **Optional: offset shipment dates.** If you prefer, shift all dates by a fixed number of days before exporting (for example, subtract 30 days from every date). Signal's coverage calculations use relative intervals, so the flags will still be accurate.
 ---
 ## Step 2: Upload Your File
 1. Go to the Signal URL your STTIL Solutions contact provided
 2. Click **Import CSV** in the left sidebar
 3. Select your prepared export file
 4. Signal will process the file and display your worklist
 If Signal cannot read a column, it will show you what it detected and what it could not map. You will see which fields were confirmed and which need review.
 ---
 ## Step 3: Read the Worklist
 Signal assigns each patient record one of four status labels:
 | Status | What It Means | Action |
 |--------|--------------|--------|
 | **Supply Lapsed** | Coverage window has expired. This patient cannot receive a new shipment until prescriber contact is confirmed. | Contact prescriber immediately |
 | **Renewal Due** | The 6-month qualifying visit window is approaching or has passed. Documentation must be confirmed before the next resupply. | Request visit documentation |
 | **Resupply Ready** | Patient is within the resupply window. Supplies can be initiated now. | Initiate shipment |
 | **Active** | Coverage is on track. No immediate action needed. | No action required |
 Each row also shows:
 - Days until coverage ends (or days overdue)
 - The reason Signal assigned that status
 - The recommended next action for your staff
 Patients are sorted by urgency, with Supply Lapsed and Renewal Due at the top.
 ---
 ## What Signal Is Checking
 For each patient record, Signal evaluates five documentation requirements:
 1. **Qualifying visit** — The 6-month physician encounter required before resupply. Signal tracks whether the renewal window is approaching or has already passed.
 2. **Standard Written Order (SWO)** — The order on file must match the current shipment. Signal flags records where the SWO status is uncertain based on available data.
 3. **PECOS enrollment** — The prescriber must have active enrollment at the time of shipment.
 4. **Prior authorization** — PA must cover the current shipment codes and must not have expired.
 5. **Proof of Delivery** — Documentation must be complete before the claim is filed.
 For the pilot, Signal calculates based on the shipment data you provide. Fields not included in your export will appear as unknown and will not affect other calculations.
 ---
 ## Step 4: Export and Cross-Reference
 When you are ready to act on the worklist:
 1. Click **Export** in the top-right corner of the Signal dashboard
 2. Download the work queue CSV
 3. Open the file in your billing system or spreadsheet tool
 4. Use your crosswalk table to match Signal's synthetic patient IDs back to your real patient records
 5. Assign the work queue items to your billing staff
 The export file includes: patient identifier, device, payer, status, days until coverage ends, recommended action, and the reason code Signal used.
 ---
 ## Step 5: Validate Signal's Accuracy
 After the pilot period (30 to 60 days), we will review Signal's flags against your actual claim outcomes together. This is how we confirm the tool is working correctly for your patient population and payer mix.
 We will look at:
 - **Flag accuracy** — Did patients Signal flagged actually have documentation gaps?
 - **False positives** — Did Signal flag patients who turned out to be fine?
 - **False negatives** — Were there denials that Signal did not flag?
 - **Time saved** — How long did it take to review the worklist versus your current process?
 You do not need to track this in a special format. During our review call, you can walk through a sample of flagged records and tell us what actually happened. That feedback is what we use to improve Signal's rules for your workflow.
 ---
 ## Your Data. Your Patients.
 Everything Signal receives during this pilot is:
 - Synthetic IDs only (no real patient names, SSNs, or DOBs)
 - Used only to generate the worklist and return it to you
 - Deleted within 30 days of pilot conclusion
 Your crosswalk table (synthetic ID to real patient ID) stays in your system. STTIL Solutions never sees it.
 ---
 ## Success Metrics
 | Metric | Target |
 |--------|--------|
 | Coverage flag accuracy | 85% or higher |
 | Records successfully processed from your export | 90% or higher |
 | Staff review time per worklist | Under 15 minutes |
 | Staff confidence in recommended actions | Positive rating |
 ---
 ## Questions and Support
 Contact: Kisa at STTIL Solutions
 Email: [contact provided separately]
 If Signal cannot read your file format or if a column is not mapping correctly, send us the header row from your export (no patient data needed) and we will update the mapping for your system.
--- a/python-backend/api/main.py
+++ b/python-backend/api/main.py
@ -2,12 +2,13 @@
 import csv
 import io
 import os
 import sys
 from datetime import date
 from pathlib import Path
 from typing import Optional
-from fastapi import FastAPI, File, HTTPException, UploadFile
+from fastapi import Depends, FastAPI, File, Header, HTTPException, UploadFile
 from fastapi.middleware.cors import CORSMiddleware
 from fastapi.responses import StreamingResponse
 from pydantic import BaseModel
@ -18,18 +19,41 @@ if str(_backend_root) not in sys.path:
    sys.path.insert(0, str(_backend_root))
 from core.coverage_calculator import ShipmentRecord, calculate_batch
 from core.audit_logger import AuditAction, log_event
 from api.normalizer import normalize_csv
 app = FastAPI(title="Signal API", version="1.0.0", docs_url="/docs")
 # CORS — locked to Vercel frontend and localhost for dev.
 # Set ALLOWED_ORIGINS in Railway as a comma-separated list for production.
 _origins_env = os.getenv("ALLOWED_ORIGINS", "")
 _allowed_origins: list[str] = (
    [o.strip() for o in _origins_env.split(",") if o.strip()]
    if _origins_env
    else [
        "http://localhost:5173",
        "http://localhost:5174",
        "http://127.0.0.1:5173",
    ]
 )
 app.add_middleware(
    CORSMiddleware,
-    allow_origins=["*"],
+    allow_origins=_allowed_origins,
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
 )
 # API key auth — enforced when SIGNAL_API_KEY env var is set.
 # In dev (no env var), all requests pass. In production, X-API-Key header is required.
 _api_key = os.getenv("SIGNAL_API_KEY", "")
 def _require_api_key(x_api_key: str = Header(default="")) -> None:
    if _api_key and x_api_key != _api_key:
        raise HTTPException(status_code=401, detail="Invalid or missing API key")
 DEVICE_DISPLAY = {
    "dexcom_g7": "Dexcom G7",
    "dexcom_g6": "Dexcom G6",
@ -68,6 +92,7 @@ class RecordOut(BaseModel):
    action: str
    status_label: str
    reason: str
    rule_version: str
 class UploadResponse(BaseModel):
@ -76,6 +101,7 @@ class UploadResponse(BaseModel):
    skipped: int
    skipped_reasons: list[str]
    stats: dict
    mapping_summary: dict
 def _build_reason(flag_val: str, days_until_end: int, days_until_visit: Optional[int]) -> str:
@ -116,6 +142,7 @@ def _to_record_out(r) -> RecordOut:
        action=FLAG_ACTIONS.get(flag_val, "Review"),
        status_label=FLAG_LABELS.get(flag_val, flag_val),
        reason=_build_reason(flag_val, r.days_until_coverage_end, r.days_until_visit_due),
        rule_version=r.rule_version,
    )
@ -137,7 +164,10 @@ def health():
@app.post("/api/upload", response_model=UploadResponse)
-async def upload_csv(file: UploadFile = File(...)):
+async def upload_csv(
    file: UploadFile = File(...),
    _auth: None = Depends(_require_api_key),
 ):
    if not (file.filename or "").endswith(".csv"):
        raise HTTPException(status_code=400, detail="File must be a .csv")
@ -147,31 +177,41 @@ async def upload_csv(file: UploadFile = File(...)):
    except UnicodeDecodeError:
        text = content.decode("latin-1")
-    records, skipped_reasons = normalize_csv(text)
+    records, skipped_reasons, mapping_summary = normalize_csv(text)
    if not records:
        log_event(AuditAction.CSV_INGEST, file.filename or "unknown", "demo_user",
                  "failure", "0.0.0.0", detail="No processable rows")
        raise HTTPException(
            status_code=422,
            detail={
                "message": "No processable rows found in the uploaded file.",
                "skipped": skipped_reasons[:10],
                "mapping_summary": mapping_summary,
            },
        )
    results = calculate_batch(records, as_of=date.today())
    out = [_to_record_out(r) for r in results]
    log_event(AuditAction.CSV_INGEST, file.filename or "unknown", "demo_user",
              "success", "0.0.0.0", detail=f"{len(out)} records scored")
    return UploadResponse(
        records=out,
        total=len(out),
        skipped=len(skipped_reasons),
        skipped_reasons=skipped_reasons[:20],
        stats=_compute_stats(out),
        mapping_summary=mapping_summary,
    )
@app.post("/api/export")
-async def export_work_queue(records: list[RecordOut]):
+async def export_work_queue(
    records: list[RecordOut],
    _auth: None = Depends(_require_api_key),
 ):
    """Generate a downloadable work-queue CSV from a list of scored records."""
    output = io.StringIO()
    writer = csv.writer(output)
@ -203,6 +243,8 @@ async def export_work_queue(records: list[RecordOut]):
    output.seek(0)
    today = date.today().isoformat()
    log_event(AuditAction.WORKLIST_EXPORT, f"work-queue-{today}", "demo_user",
              "success", "0.0.0.0", detail=f"{len(records)} records exported")
    return StreamingResponse(
        io.BytesIO(output.getvalue().encode("utf-8")),
        media_type="text/csv",
--- a/python-backend/api/normalizer.py
+++ b/python-backend/api/normalizer.py
@ -20,17 +20,22 @@ HEADER_MAP: dict[str, list[str]] = {
    "patient_id": [
        "patient_id", "patientid", "patient id", "pt_id", "pt id",
        "mrn", "account_number", "account number", "account_no",
-        "patient_account", "acct_no", "id", "patient",
+        "patient_account", "acct_no", "acct no", "acct #", "acct#",
        "id", "patient", "member_id", "member id",
        "external patient ref", "external_patient_ref", "external ref",
    ],
    "device_type": [
        "device_type", "device type", "device", "devicetype",
        "product_type", "product type", "product", "item",
        "item_description", "item description", "hcpcs_description",
-        "description", "product_name",
+        "hcpcs description", "description", "product_name",
        "dme", "dme description", "dme_description", "dme desc",
        "equipment", "equipment description",
    ],
    "shipment_date": [
        "shipment_date", "shipment date", "ship_date", "ship date",
-        "dispense_date", "dispense date", "service_date", "service date",
+        "dispense_date", "dispense date", "dispensedate",
        "service_date", "service date",
        "order_date", "order date", "date_of_service", "dos",
        "fill_date", "fill date", "last_ship_date", "last ship date",
    ],
@ -129,6 +134,17 @@ def _map_header(raw: str) -> Optional[str]:
    return None
 def _map_header_with_confidence(raw: str) -> tuple[Optional[str], str]:
    """Return (canonical_field, confidence) where confidence is 'high' or 'inferred'."""
    key = _normalize_key(raw)
    for canonical, aliases in HEADER_MAP.items():
        if key == _normalize_key(canonical):
            return canonical, "high"
        if key in [_normalize_key(a) for a in aliases]:
            return canonical, "inferred"
    return None, "unmapped"
 def _parse_date(value: str) -> Optional[date]:
    value = value.strip()
    for fmt in DATE_FORMATS:
@ -158,25 +174,47 @@ def _normalize_payer(value: str) -> str:
    return "commercial"
-def normalize_csv(text: str) -> tuple[list[ShipmentRecord], list[str]]:
+def normalize_csv(text: str) -> tuple[list[ShipmentRecord], list[str], dict]:
    """
-    Parse raw CSV text and return (records, skipped_reasons).
+    Parse raw CSV text and return (records, skipped_reasons, mapping_summary).
    Tolerates header drift and normalizes device/payer/date values.
    mapping_summary format:
        {
          "mapped": {canonical_field: {"raw_header": str, "confidence": "high"|"inferred"}},
          "unmapped_columns": [str],
          "required_missing": [str],
        }
    """
    reader = csv.DictReader(io.StringIO(text.strip()))
    if not reader.fieldnames:
-        return [], ["No headers found in file"]
+        return [], ["No headers found in file"], {}
    column_map: dict[str, str] = {}
    mapping_detail: dict[str, dict] = {}
    unmapped_columns: list[str] = []
    for raw_header in reader.fieldnames:
-        canonical = _map_header(raw_header)
+        canonical, confidence = _map_header_with_confidence(raw_header)
        if canonical:
            column_map[raw_header] = canonical
            mapping_detail[canonical] = {"raw_header": raw_header, "confidence": confidence}
        else:
            unmapped_columns.append(raw_header)
    required_fields = {"patient_id", "device_type", "shipment_date"}
    required_missing = [f for f in required_fields if f not in mapping_detail]
    mapping_summary = {
        "mapped": mapping_detail,
        "unmapped_columns": unmapped_columns,
        "required_missing": required_missing,
    }
    records: list[ShipmentRecord] = []
    skipped: list[str] = []
-    for i, row in enumerate(reader, start=2):
+    for i, row in enumerate(reader, start=2):  # noqa: B007
        mapped: dict[str, str] = {}
        for raw_h, canonical in column_map.items():
            mapped[canonical] = (row.get(raw_h) or "").strip()
@ -218,4 +256,4 @@ def normalize_csv(text: str) -> tuple[list[ShipmentRecord], list[str]]:
            component=component,
        ))
-    return records, skipped
+    return records, skipped, mapping_summary
--- a/python-backend/core/coverage_calculator.py
+++ b/python-backend/core/coverage_calculator.py
@ -27,6 +27,8 @@ from typing import Optional
 logger = logging.getLogger(__name__)
 RULE_VERSION = "v0.1"
 PAYER_RULES_PATH = Path(__file__).parent.parent / "config" / "payer_rules.json"
@ -66,6 +68,7 @@ class CoverageResult:
    days_until_coverage_end: int
    days_until_visit_due: Optional[int]
    priority_score: int  # Higher = more urgent; used for worklist sort
    rule_version: str = RULE_VERSION
 def _load_payer_rules() -> dict:
--- a/signal-ui/.gitignore
+++ b/signal-ui/.gitignore
@ -1,2 +1,3 @@
 node_modules/
 dist/
 .vercel
--- a/signal-ui/src/lib/api.js
+++ b/signal-ui/src/lib/api.js
@ -4,6 +4,7 @@
 */
 const BACKEND_URL = "https://signal-api-production-91c2.up.railway.app";
 const API_KEY = import.meta.env.VITE_SIGNAL_API_KEY || "";
 /**
 * Upload a CSV file to the backend scoring endpoint.
@ -16,6 +17,7 @@ export async function uploadToBackend(file) {
  try {
    const resp = await fetch(`${BACKEND_URL}/api/upload`, {
      method: "POST",
      headers: API_KEY ? { "X-API-Key": API_KEY } : {},
      body: formData,
    });
    if (!resp.ok) {
--- a/test-data/generate_10k.py
+++ b/test-data/generate_10k.py
@ -0,0 +1,84 @@
 """
 Generate a 10,000-row synthetic patient CSV for Signal volume testing.
 Uses canonical headers and synthetic patient IDs (SYN-00001 through SYN-10000).
 Realistic distribution across flags, payers, and devices.
 Usage:
    python3 test-data/generate_10k.py
 """
 import csv
 import random
 from datetime import date, timedelta
 from pathlib import Path
 random.seed(99)
 TODAY = date.today()
 OUTPUT = Path(__file__).parent / "10k-patients.csv"
 DEVICE_OPTIONS = [
    ("dexcom_g7",         "sensor",  0.40),
    ("freestyle_libre_3", "sensor",  0.25),
    ("freestyle_libre_2", "sensor",  0.20),
    ("dexcom_g6",         "sensor",  0.10),
    ("omnipod_5",         "pod",     0.05),
 ]
 PAYER_OPTIONS = [
    ("Medicare Part B",   0.50),
    ("Medicaid - GA",     0.10),
    ("Medicaid - PA",     0.10),
    ("BCBS - FL",         0.08),
    ("Aetna",             0.07),
    ("UnitedHealth",      0.06),
    ("Cigna",             0.05),
    ("Humana",            0.04),
 ]
 FLAG_DATE_RANGES = [
    ("out_of_coverage", (TODAY - timedelta(days=600), TODAY - timedelta(days=400)), 0.30),
    ("visit_due",       (TODAY - timedelta(days=400), TODAY - timedelta(days=250)), 0.25),
    ("refill_window",   (TODAY - timedelta(days=30),  TODAY - timedelta(days=20)),  0.20),
    ("ok",              (TODAY - timedelta(days=10),  TODAY - timedelta(days=1)),   0.25),
 ]
 devices     = [d[0] for d in DEVICE_OPTIONS]
 dev_weights = [d[2] for d in DEVICE_OPTIONS]
 dev_comp    = {d[0]: d[1] for d in DEVICE_OPTIONS}
 payers      = [p[0] for p in PAYER_OPTIONS]
 pay_weights = [p[1] for p in PAYER_OPTIONS]
 flags       = [f[0] for f in FLAG_DATE_RANGES]
 flag_ranges = {f[0]: f[1] for f in FLAG_DATE_RANGES}
 flag_weights= [f[2] for f in FLAG_DATE_RANGES]
 def random_date_in(bucket):
    start, end = bucket
    delta = (end - start).days
    return start + timedelta(days=random.randint(0, max(delta, 0)))
 rows_written = 0
 with open(OUTPUT, "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["patient_id", "device_type", "shipment_date", "quantity", "payer", "component"])
    for i in range(1, 10_001):
        pid     = f"SYN-{i:05d}"
        device  = random.choices(devices, weights=dev_weights)[0]
        comp    = dev_comp[device]
        payer   = random.choices(payers, weights=pay_weights)[0]
        flag    = random.choices(flags, weights=flag_weights)[0]
        ship    = random_date_in(flag_ranges[flag])
        qty     = random.choice([1, 2, 3, 6, 9, 14])
        writer.writerow([pid, device, ship.isoformat(), qty, payer, comp])
        rows_written += 1
 print(f"Wrote {OUTPUT}")
 print(f"Rows: {rows_written:,}")
 print("Distribution targets: 30% Out of Coverage, 25% Visit Due, 20% Resupply Ready, 25% Active")
--- a/test-data/generate_samples.py
+++ b/test-data/generate_samples.py
@ -1,4 +1,11 @@
-"""Generate 25 CSV test files covering all flag states."""
+"""
 Generate 25 CSV test files with VARIED headers simulating messy supplier exports.
 Each file uses a different combination of column names, date formats, column order,
 and payer strings — matching what real DME billing system exports look like.
 The normalizer (normalizer.py) should successfully process all 25 files.
 """
 import csv
 import random
 import os
@ -7,44 +14,182 @@ from datetime import date, timedelta
 random.seed(42)
 DEVICE_TYPES = ["dexcom_g7", "dexcom_g6", "freestyle_libre_3", "omnipod_5"]
 PAYERS = ["Medicare Part B", "Medicaid - GA", "BCBS - FL", "Aetna", "UnitedHealth", "Cigna", "Humana"]
 COMPONENTS = {"dexcom_g7": "sensor", "dexcom_g6": "sensor", "freestyle_libre_3": "sensor", "omnipod_5": "pod"}
 COMPONENT_DISPLAY = {"sensor": "Sensor", "pod": "Pod"}
 # Shipment date ranges to trigger different flag states
 TODAY = date.today()
 DATE_BUCKETS = {
-    "OK": (TODAY - timedelta(days=10), TODAY - timedelta(days=1)),
+    "ok":              (TODAY - timedelta(days=10),  TODAY - timedelta(days=1)),
-    "VISIT_DUE": (TODAY - timedelta(days=400), TODAY - timedelta(days=250)),   # old visit, no recent qualifier
+    "visit_due":       (TODAY - timedelta(days=400), TODAY - timedelta(days=250)),
-    "OUT_OF_COVERAGE": (TODAY - timedelta(days=600), TODAY - timedelta(days=500)),  # way too old
+    "out_of_coverage": (TODAY - timedelta(days=600), TODAY - timedelta(days=500)),
-    "REFILL_WINDOW": (TODAY - timedelta(days=30), TODAY - timedelta(days=25)),   # inside resupply window
+    "refill_window":   (TODAY - timedelta(days=30),  TODAY - timedelta(days=25)),
 }
 OUTPUT_DIR = os.path.dirname(os.path.abspath(__file__))
-for i in range(1, 26):
+# --- Header variation configs ---
-    flag = random.choice(list(DATE_BUCKETS.keys()))
+# Each entry: (patient_id_col, device_col, date_col, qty_col, payer_col, component_col, date_fmt, extras)
-    bucket = DATE_BUCKETS[flag]
+
-    delta = (bucket[1] - bucket[0]).days
+HEADER_VARIANTS = [
-    ship_date = bucket[0] + timedelta(days=random.randint(0, max(delta, 1)))
+    # 1 — canonical
    ("patient_id", "device_type", "shipment_date", "quantity", "payer", "component",
     "%Y-%m-%d", {}),
    # 2 — Brightree-style
    ("Patient ID", "Item Description", "Service Date", "Qty", "Insurance Name", "Item Type",
     "%m/%d/%Y", {"Prescriber NPI": "1234567890", "Branch": "PA-001"}),
    # 3 — all caps, short
    ("PT_ID", "DEVICE", "SHIP DATE", "UNITS", "CARRIER", "TYPE",
     "%Y-%m-%d", {}),
    # 4 — MRN style + text date
    ("MRN", "Product Name", "Dispense Date", "Qty Dispensed", "Plan Name", "Supply Type",
     "%d-%b-%Y", {"Supplier": "Gaboro DME", "State": "PA"}),
    # 5 — account number + Medicaid payer strings
    ("Account Number", "Product", "Fill Date", "Count", "Primary Payer", "component",
     "%m/%d/%y", {}),
    # 6 — external ref + ISO datetime
    ("External Patient Ref", "Item", "Date of Service", "Qty Shipped", "Insurance", "item_type",
     "%Y-%m-%dT%H:%M:%S", {"Notes": "batch export"}),
    # 7 — Acct # abbreviation + YYYYMMDD
    ("Acct #", "DME", "Order Date", "quantity", "plan", "component_type",
     "%Y%m%d", {}),
    # 8 — patient + device type alternate
    ("patient", "device type", "last ship date", "units dispensed", "payer name", "type",
     "%m/%d/%Y", {"Supplier Branch": "NY-003"}),
    # 9 — pt_id + product_type + Medicaid variant
    ("pt_id", "product_type", "dos", "qty", "ins_name", "component",
     "%Y-%m-%d", {}),
    # 10 — account_no + hcpcs description
    ("account_no", "hcpcs_description", "service_date", "units", "primary_payer", "supply_type",
     "%m-%d-%Y", {"HCPCS Code": "A9277"}),
    # 11 — patient_account + commercial payer strings
    ("patient_account", "description", "ship_date", "quantity_dispensed", "carrier", "component",
     "%Y-%m-%d", {"Account Manager": "J. Smith"}),
    # 12 — id + product + Medicaid-GA
    ("id", "product", "fill_date", "qty_shipped", "payer", "item_type",
     "%m/%d/%Y", {}),
    # 13 — PT ID spaces + BCBS
    ("PT ID", "Device Type", "Shipment Date", "Quantity", "Insurance", "Component",
     "%Y-%m-%d", {"Region": "Southeast"}),
    # 14 — patientid (no space) + dispense date
    ("patientid", "devicetype", "dispensedate", "qty", "payername", "supplytype",
     "%m/%d/%Y", {}),
    # 15 — account_number + Aetna
    ("account_number", "item_description", "order_date", "units_dispensed", "plan_name", "component",
     "%d/%m/%Y", {"Facility": "Gaboro PA Main"}),
    # 16 — MRN + UHC + transmitter component
    ("MRN", "Product Type", "Service Date", "Qty", "Insurance Name", "Component Type",
     "%Y-%m-%d", {}),
    # 17 — mixed case + Humana
    ("Patient_ID", "Device", "Ship_Date", "Units", "Plan", "Type",
     "%m/%d/%Y %H:%M:%S", {"Export Type": "CGM Only"}),
    # 18 — patient id (space) + Cigna + extra cols
    ("patient id", "item", "dispense date", "count", "carrier", "supply_type",
     "%Y-%m-%d", {"Billing Staff": "M. Jones", "Auth Number": "CGM-2026-001"}),
    # 19 — Acct No + Anthem
    ("Acct No", "Product Name", "Last Ship Date", "Qty Dispensed", "Primary Payer", "Component",
     "%b %d, %Y", {}),
    # 20 — MEMBER ID style
    ("Member ID", "DME Description", "DOS", "QTY", "Insurance", "Item Type",
     "%Y-%m-%d", {"Payer ID": "00019"}),
    # 21 — pt id + CMS payer
    ("pt id", "device_type", "service date", "quantity", "payer", "component",
     "%m/%d/%Y", {}),
    # 22 — acct_no + Molina (Medicaid)
    ("acct_no", "product", "fill date", "units", "insurance name", "type",
     "%Y-%m-%d", {"Branch Code": "GA-02"}),
    # 23 — External Ref + WellCare (Medicaid) + timestamp
    ("External Patient Ref", "Item Description", "Dispense Date", "Quantity", "Plan Name", "Supply Type",
     "%m/%d/%Y %H:%M:%S", {}),
    # 24 — patient_id canonical + extra noise columns
    ("patient_id", "device_type", "shipment_date", "quantity", "payer", "component",
     "%Y-%m-%d", {"Internal Code": "DME-99", "Region": "Northeast", "Staff ID": "STAFF-001"}),
    # 25 — Acct # + Blue Cross + B %d, %Y date
    ("Acct #", "Device", "Order Date", "Qty", "Insurance", "Component",
     "%B %d, %Y", {"Supplier Code": "STTIL-01"}),
 ]
 PAYER_STRINGS = {
    "medicare": [
        "Medicare Part B", "Medicare", "CMS", "Medicare Part A", "Medicare Part B - CGM",
    ],
    "medicaid": [
        "Medicaid - GA", "Medicaid - PA", "Molina Healthcare", "WellCare", "Centene",
        "Georgia Medicaid", "Medicaid",
    ],
    "commercial": [
        "BCBS - FL", "Blue Cross Blue Shield", "Aetna", "UnitedHealth", "UHC",
        "Cigna", "Humana", "Anthem", "United Healthcare", "Aetna Commercial",
    ],
 }
 DEVICE_DISPLAY = {
    "dexcom_g7":       ["Dexcom G7", "G7", "Dexcom G7 CGM", "dexcom g7"],
    "dexcom_g6":       ["Dexcom G6", "G6", "Dexcom G6 Pro", "dexcom g6"],
    "freestyle_libre_3": ["FreeStyle Libre 3", "Libre 3", "FSL3", "fs libre 3", "FreestyleLibre3"],
    "omnipod_5":       ["Omnipod 5", "Omnipod", "OmniPod 5", "op5"],
 }
 flags_assigned = random.choices(
    ["out_of_coverage", "visit_due", "refill_window", "ok"],
    weights=[30, 25, 25, 20],
    k=25,
 )
 def random_date(bucket):
    start, end = bucket
    delta = (end - start).days
    return start + timedelta(days=random.randint(0, max(delta, 0)))
 def format_date(d, fmt):
    return d.strftime(fmt)
 def random_payer_string(device):
    payer_category = random.choices(
        ["medicare", "medicaid", "commercial"],
        weights=[50, 20, 30],
    )[0]
    return random.choice(PAYER_STRINGS[payer_category])
 for i, variant in enumerate(HEADER_VARIANTS, start=1):
    pid_col, dev_col, date_col, qty_col, payer_col, comp_col, date_fmt, extras = variant
    flag_key = flags_assigned[i - 1]
    bucket = DATE_BUCKETS[flag_key]
    device = random.choice(DEVICE_TYPES)
    component = COMPONENTS[device]
-    payer = random.choice(PAYERS)
+    payer_str = random_payer_string(device)
-    quantity = random.choice([1, 2, 3, 6, 9, 14])
+    num_rows = random.randint(3, 8)
-    filename = f"sample-batch-{i:02d}-{flag.lower()}.csv"
+    filename = f"sample-batch-{i:02d}-{flag_key}.csv"
    filepath = os.path.join(OUTPUT_DIR, filename)
    fieldnames = [pid_col, dev_col, date_col, qty_col, payer_col, comp_col] + list(extras.keys())
    with open(filepath, "w", newline="") as f:
-        writer = csv.writer(f)
+        writer = csv.DictWriter(f, fieldnames=fieldnames)
-        writer.writerow(["patient_id", "device_type", "shipment_date", "quantity", "payer", "component"])
+        writer.writeheader()
        # 3-8 rows per file
        num_rows = random.randint(3, 8)
        for j in range(num_rows):
            pid = f"PT-{1001 + (i - 1) * 10 + j}"
-            row_ship = ship_date + timedelta(days=random.randint(-5, 5))
+            ship = random_date(bucket)
-            writer.writerow([pid, device, row_ship.isoformat(), random.choice([1, 2, 3, 6, 9]), payer, component])
+            # add slight jitter
            ship = ship + timedelta(days=random.randint(-3, 3))
            row = {
                pid_col:   pid,
                dev_col:   random.choice(DEVICE_DISPLAY[device]),
                date_col:  format_date(ship, date_fmt),
                qty_col:   random.choice([1, 2, 3, 6, 9]),
                payer_col: payer_str,
                comp_col:  component,
            }
            for k, v in extras.items():
                row[k] = v
            writer.writerow(row)
-    print(f"Wrote {filename} ({num_rows} rows, flag={flag})")
+    print(f"Wrote {filename} ({num_rows} rows, flag={flag_key}, headers: {pid_col}|{dev_col}|{date_col}|{payer_col})")
 print(f"\nDone — 25 files in {OUTPUT_DIR}")
--- a/tests/init.py
+++ b/tests/init.py
--- a/tests/smoke_test.py
+++ b/tests/smoke_test.py
@ -0,0 +1,180 @@
 """
 Signal smoke test — runs against the local backend (port 8001).
 Usage:
    cd /Users/sttil-solutions/projects/signal
    python3 tests/smoke_test.py
 Or via pytest:
    pytest tests/smoke_test.py -v
 What it does:
 1. Starts uvicorn on port 8001 as a subprocess
 2. Waits for /health to respond
 3. POSTs a sample CSV to /api/upload — verifies records are scored
 4. POSTs the scored records to /api/export — verifies CSV download
 5. Reports PASS or FAIL with reason
 6. Kills the server
 """
 import json
 import os
 import subprocess
 import sys
 import time
 import urllib.request
 import urllib.error
 from pathlib import Path
 BACKEND_PORT = 8001
 BASE_URL = f"http://localhost:{BACKEND_PORT}"
 SIGNAL_ROOT = Path(__file__).parent.parent
 SAMPLE_CSV = SIGNAL_ROOT / "test-data" / "sample-batch-01-ok.csv"
 def _wait_for_ready(timeout: int = 15) -> bool:
    deadline = time.time() + timeout
    while time.time() < deadline:
        try:
            urllib.request.urlopen(f"{BASE_URL}/health", timeout=1)
            return True
        except Exception:
            time.sleep(0.5)
    return False
 def _post_file(path: Path) -> dict:
    import email.mime.multipart
    import http.client
    boundary = "SignalSmokeBoundary"
    body_parts = []
    body_parts.append(f"--{boundary}\r\n".encode())
    body_parts.append(
        f'Content-Disposition: form-data; name="file"; filename="{path.name}"\r\n'.encode()
    )
    body_parts.append(b"Content-Type: text/csv\r\n\r\n")
    body_parts.append(path.read_bytes())
    body_parts.append(f"\r\n--{boundary}--\r\n".encode())
    body = b"".join(body_parts)
    conn = http.client.HTTPConnection("localhost", BACKEND_PORT, timeout=30)
    conn.request(
        "POST",
        "/api/upload",
        body=body,
        headers={"Content-Type": f"multipart/form-data; boundary={boundary}"},
    )
    resp = conn.getresponse()
    data = resp.read()
    conn.close()
    if resp.status != 200:
        raise RuntimeError(f"/api/upload returned {resp.status}: {data[:200]}")
    return json.loads(data)
 def _post_export(records: list) -> bytes:
    import http.client
    body = json.dumps(records).encode()
    conn = http.client.HTTPConnection("localhost", BACKEND_PORT, timeout=30)
    conn.request(
        "POST",
        "/api/export",
        body=body,
        headers={"Content-Type": "application/json"},
    )
    resp = conn.getresponse()
    data = resp.read()
    conn.close()
    if resp.status != 200:
        raise RuntimeError(f"/api/export returned {resp.status}: {data[:200]}")
    return data
 def run() -> bool:
    print("Signal Smoke Test")
    print("=" * 40)
    if not SAMPLE_CSV.exists():
        print(f"FAIL — sample CSV not found: {SAMPLE_CSV}")
        return False
    env = os.environ.copy()
    env.pop("SIGNAL_API_KEY", None)
    print("Starting backend on port 8001...")
    proc = subprocess.Popen(
        [
            sys.executable, "-m", "uvicorn",
            "api.main:app",
            "--host", "127.0.0.1",
            "--port", str(BACKEND_PORT),
            "--log-level", "warning",
        ],
        cwd=str(SIGNAL_ROOT / "python-backend"),
        env=env,
        stdout=subprocess.DEVNULL,
        stderr=subprocess.DEVNULL,
    )
    try:
        print("Waiting for backend to be ready...")
        if not _wait_for_ready():
            print("FAIL — backend did not start within 15 seconds")
            return False
        print("Backend ready.")
        # Test 1: upload
        print("Uploading sample CSV...")
        result = _post_file(SAMPLE_CSV)
        total = result.get("total", 0)
        records = result.get("records", [])
        mapping = result.get("mapping_summary", {})
        if total == 0 or not records:
            print(f"FAIL — /api/upload returned 0 records. Response: {result}")
            return False
        print(f"  Upload OK: {total} records scored")
        # Test 2: mapping summary present
        if not mapping.get("mapped"):
            print("FAIL — mapping_summary missing from response")
            return False
        print(f"  Mapping OK: {list(mapping['mapped'].keys())} mapped")
        # Test 3: rule_version present
        rv = records[0].get("rule_version", "")
        if not rv:
            print("FAIL — rule_version missing from records")
            return False
        print(f"  Rule version OK: {rv}")
        # Test 4: export
        print("Exporting work queue...")
        csv_bytes = _post_export(records)
        lines = csv_bytes.decode("utf-8").strip().splitlines()
        if len(lines) < 2:
            print(f"FAIL — /api/export returned fewer than 2 lines: {lines}")
            return False
        print(f"  Export OK: {len(lines)} lines (header + {len(lines)-1} records)")
        print("=" * 40)
        print("PASS")
        return True
    except Exception as exc:
        print(f"FAIL — exception: {exc}")
        return False
    finally:
        proc.terminate()
        proc.wait(timeout=5)
 if __name__ == "__main__":
    success = run()
    sys.exit(0 if success else 1)
 def test_signal_smoke():
    """pytest entry point."""
    assert run(), "Smoke test failed"