Mixed Methods Senior UX Researcher
AI DATA EXTRACTOR: PROTECTING $1.5M THROUGH DISCOVERY RESEARCH
U.S. Bank's Document Management Platform team needed to solve a painful problem: bankers were manually extracting data from hundreds of financial documents every week. A vendor pitched a $1.5M AI solution that looked promising in demos.
Before committing to the contract, leadership asked: will this actually work with our documents? I led discovery research to answer that question. What I found revealed that the vendor's solution couldn't handle the variability in U.S. Bank's documents. This evidence shifted the decision from BUY to BUILD, protecting the $1.5M investment.
After the BUILD decision, I continued leading research through design and post-launch measurement to ensure the team built the right solution. The result: 86% CSAT, 82 SUS, 73% adoption, and 1 hour/day saved per banker.
The Challenge
Context:
Bankers across Wealth Management, Commercial, Small Business, and Digitization teams manually extracted data from hundreds of financial documents weekly. Company names, signatory information, account details were all entered by hand into systems that required this data to process client requests. The process was slow, error-prone, and frustrating.
Goal:
A vendor's AI solution promised to automate data extraction, and leadership saw it as a quick win. But would it work with U.S. Bank's actual documents?
My Role:
Senior UX Researcher (solo, end-to-end ownership). I designed and led discovery research to de-risk the vendor decision. After the BUILD recommendation was accepted, I continued through evaluative testing and post-launch measurement.
Research Approach:
Why Three Phases
I structured the research to match the decision-making timeline and product development lifecycle:
Phase 1: Discovery Research
Goal:
Determine if the vendor's solution would work for U.S. Bank's documents and understand the banker workflows.
Why this approach:
Before committing $1.5M, I needed to validate whether the vendor's solution could handle U.S. Bank's document variability. Field research was critical. I needed to see actual documents, observe real workflows, and understand variability the vendor might not account for.
Method:
12 participants across 4 banker personas. 9 field visits (observing in-branch workflows) + 3 remote interviews with Digitization Specialists (high-volume users). Contextual inquiry, workflow observation, document analysis.
I recruited across three experience levels (novice, intermediate, expert) within each business line to capture the full expertise spectrum.
Outcome:
Evidence that the vendor couldn't handle our document variability, leading to BUILD recommendation.
Phase 2: Evaluative Testing
Goal:
Validate design decisions before engineering invested in development
Why this approach:
The BUILD decision meant the team needed to design the right interface. I tested competing approaches (side-by-side vs. tabbed layouts) to identify which solved user pain points without creating new problems.
Method:
8 participants (2 per persona) testing Figma prototypes. Moderated usability testing with task-based scenarios and think-aloud protocol.
Outcome:
Clear evidence for side-by-side layout; identified 3 critical fixes before development started.
Phase 3: Post-Launch Measurement
Goal:
Measure whether the team built the right thing and identify improvement priorities
Why this approach:
Post-launch research validates (or invalidates) pre-launch assumptions. Did adoption match predictions? Did users trust the AI? What barriers exist for non-adopters?
Six weeks post-launch captured real usage patterns (past initial learning but before the tool became routine) and included both early and slower adopters.
Method:
124 survey respondents across all 4 personas. SUS, CSAT, behavioral questions, open feedback.
Outcome:
Validated high satisfaction and adoption; identified specific barriers to address in next iteration.
Analysis Phase 1

I used affinity diagramming in Miro to synthesize all 12 discovery sessions. I chose this method because it let me manage the volume of qualitative data while keeping stakeholders aligned on emerging patterns. I documented each session immediately after (quotes and observations), then grouped related observations continuously as patterns emerged. What worked well: the visual board became a shared reference point that Product and Design could review together, which built buy-in for the BUILD recommendation.
Key Insight 1:
The vendor couldn't handle our variability
The vendor knew banking documents vary, but their solution was built for variability within limits. U.S. Bank's documents exceeded those limits.
Field research revealed three dimensions of variability the vendor's system couldn't handle: institution-specific formats, legacy versions from bank acquisitions, and quality issues affecting 30% of daily volume. The vendor's parser would fail on the majority of U.S. Bank's documents.
Why this mattered: This wasn't a minor technical limitation. The vendor's solution was designed for a narrower range of document types than what U.S. Bank processes daily. Discovering this before contract signing prevented a $1.5M investment in a solution that would have failed in production.
I presented these findings to Platform Owner, Design Lead, and Engineering Lead.
The recommendation was clear: BUILD, not BUY. This shifted the program from vendor evaluation to custom development, with me continuing as research lead through design and launch.
Key Insight 2:
One interface can't fit all
Processing time varied 4x between expert and novice bankers. Their needs were fundamentally different, and a one-size-fits-all approach would fail both groups.
Experts (5+ years) process documents in 2-3 minutes through pattern recognition and muscle memory. Novices (0-1 year) take 7-10 minutes, reading carefully and consulting cheat sheets. Forcing experts through novice-friendly guidance would frustrate them. Giving novices expert-level tools would overwhelm them.
I created journey maps for each of the 4 banker personas, documenting pain points, decision-making patterns, and emotional states throughout the document processing workflow. These maps revealed not just time differences but the distinct needs that shaped the multi-level interface design.
Why this approach mattered: Rather than designing for an "average" banker (who doesn't exist), the team could build one interface with multiple pathways. Click-to-scroll highlighting helps anyone validate where the AI extracted data (essential for building trust across all experience levels). Keyboard shortcuts let power users move faster. Optional tooltips provide guidance for novice users without cluttering the interface for experts. This approach meant the interface could serve all experience levels without forcing a single interaction model.
Key Insight 3:
Cognitive load created
compliance risks
During field observations, I watched bankers develop workarounds to manage impossible cognitive load. The workarounds were risky: printing digital documents (disposal risk), writing client info on sticky notes (security risk), toggling between clients without clearing screens (exposure risk). These weren't laziness. They were rational responses to systems that demanded too much working memory.
Why this mattered: The interface architecture became non-negotiable: side-by-side layout. Source document on left, data entry form on right. Everything visible simultaneously. This eliminated the memory burden that created risky workarounds.
But I needed to test this assumption. What if a tabbed interface performed just as well with less screen real estate? Discovery research suggested side-by-side would work better, but evaluative testing would prove it.
Testing The Interface
With Design's Figma prototypes, I tested the side-by-side layout against a tabbed alternative with 8 participants.
I included the tabbed layout even though discovery suggested it would fail because I needed evidence, not assumptions, to defend the recommendation.

*Side-by-side layout
Why test something I didn't recommend? Discovery research gave me strong hypotheses, but hypotheses aren't evidence. Testing both approaches gave me data to defend the recommendation. When stakeholders asked "why not tabs?" I had clear performance metrics: 100% vs. 62.5% task completion, half the processing time, and direct quotes from users explaining why tabs failed.
The side-by-side layout proved superior. More importantly, it matched the actual workflow: bankers process one document at a time, extracting all needed data before moving to the next document. The side-by-side layout supported this natural workflow.
The Impact
Six weeks post-launch, 104 bankers confirmed the system works: 86% satisfaction, 82 SUS, 73% adoption, 1 hour saved per day.
Conditional trust
Users developed high confidence on standardized documents while appropriately verifying complex, high-stakes documents. Source highlighting and confidence scoring built trust where the AI earned it, skepticism where caution matters.
Non-adopters
(27%) cited change resistance, trust issues, and unsupported document types. These are barriers the team is now addressing through training, expanded document coverage, and system stability fixes.
Beyond the product
This research elevated how the Document Management Platform team approached AI tools. Collaborative prioritization workshops became standard practice. Research findings became part of our vendor evaluation process. I negotiated a seat at quarterly platform planning meetings, shifting research from tactical to strategic.
What I Learned
Discovery research served two purposes: It prevented a $1.5M mistake by revealing that the vendor couldn't handle our document variability. It also informed what the team needed to build: document variability patterns, expertise-level differences, and workflow constraints that shaped the final design.
Daily stakeholder debriefs during field visits built shared context. Product and Design saw the same patterns I did. No surprises in final readouts.
Adding Digitization Specialists mid-project was the right call. Initially planned for 3 business lines, but discovered power users during Week 2. These specialists process 100+ documents daily with quota-driven workflows. Their needs (keyboard shortcuts, batch processing, speed optimization) would have been completely missed if I'd stuck to the original plan. I adjusted recruitment to capture these critical requirements.
Testing competing approaches gave me evidence to defend recommendations even when I had a strong hypothesis from discovery. Stakeholders didn't have to trust my intuition. They had performance data.
If I did it again: I'd schedule a technical feasibility discussion between usability testing and prioritization workshop. This would surface implementation constraints earlier, leading to more realistic prioritization.