Testing the Hypothesis

Every product hypothesis looks reasonable until it meets real users.

I was actually testing two hypotheses, not one. The first was about the problem and the product: directors are time-poor, packs are getting bigger, and a private AI tool that triages what matters — with source references and cross-pack memory — would be something they'd pay for and use every cycle.

The second was about the business model: that this could be sold direct-to-director as a B2C product, with directors signing up themselves, paying themselves, and adopting the tool because the personal benefit was clear enough to bypass the usual enterprise sales motion. This mattered because B2B sales cycles are long, expensive, and gated by people who aren't the actual user. Having built products inside companies that had to navigate those cycles for years, I knew if I could find a category where the user was also the buyer and the decision didn't require institutional sign-off, I'd have a meaningfully better shot at building something sustainable as a small founding team.

Both hypotheses needed real testing. Some of what I expected was right. Some was wrong in instructive ways. And some of what I thought was important turned out to matter much less than I expected.

But before I could test any of it properly, I had to design a testing methodology that would give me real insight rather than polite validation.

The three-phase testing approach.

I ran testing in three deliberate stages rather than building first and asking for feedback later. This was conscious product discipline — one of the things I've seen fail repeatedly is building lots of things, shipping them, and then discovering the core assumptions were off. At that point you've got sunk cost working against your ability to respond to learnings.

Phase 1 — Baseline: watch them do the job.

I recruited directors I trusted and asked for two things: two of their real board packs, and an audio recording of them doing a verbal walk-through of their review process — essentially thinking out loud, the way usability researchers work.

There was a deliberate test inside this ask. Putting a real barrier in front of people is one of the most useful ways to validate that the pain you think you're solving is genuinely worth solving. Anyone can say a problem is painful when you describe it to them over coffee. Far fewer people will hand over two of their actual board packs and spend a few hours recording themselves reviewing one. If directors weren't willing to do that initial piece of work, the underlying frustration probably wasn't acute enough to drive adoption later. The friction itself was the signal.

I wasn't demoing anything. I wasn't showing them a product. I was seeing how a director actually reads a 200-page pack — what they skip, what they flag, what they search for, what they give up on. This gave me a baseline. If I couldn't show that Vizory could do at least what they were doing manually, there was no product.

Phase 2 — Controlled demo: me driving.

For their next board pack cycle, I asked them to still review their pack, but this time I also ran our early-stage product (before the name "Vizory" took hold) alongside their manual review — with me in control of the interface. They watched, reacted, gave feedback. I could show them insights side by side with their own notes and ask: did we catch the same things? Did we catch things you missed? Did we surface anything you'd have wanted to know?

This phase was about validating the AI analysis quality before putting the tool in users' hands. Of course watching someone react to a product is very different to getting them to actually use it.

Phase 3 — User control: them driving.

The third phase was giving them control — which meant building the actual secure environment: login, authentication, data isolation, the whole infrastructure. This was the phase where I discovered how much more I had to build than I'd originally planned. More on that in Part 3.

This staged approach was really useful. By the time a director was logging in independently, I'd already validated the core analysis quality and understood their mental model. They weren't encountering the product cold.

What the sessions taught me.

I ran sessions with ten early adopters across August to October 2025. Each session was different. But several themes came through universally.

The signals feature landed immediately.

Every person, within the first few minutes of seeing Key Signals, understood it without explanation. The three-bucket structure — urgent risks, execution gaps, positive momentum — mapped directly onto how directors actually think about a pack.

'It's doing it for me, as opposed to me having to go through everything to pick out the red flags.'

The source referencing was equally important. Directors are trained to be sceptical. The ability to click any AI-generated insight and see the exact page and paragraph it came from in the pack was the trust mechanism that made everything else credible.

The cross-pack search was the sleeper hit.

I expected Key Signals to be the centrepiece, and it was. But the feature that generated the most visceral reaction was cross-pack search. The ability to type a vague half-remembered thing — 'didn't they say they were going to fix this three months ago?' — and get back the exact pack, page, and context.

'That's pretty hard to do in normal environments, like even Google Drive.'

As we'll see, all that glitters is not gold though — more on this in part 5.

The financials feature was less compelling than I expected.

Toby (my co-founder) and I spent significant development time on financial metrics extraction. Directors largely shrugged. Their packs usually had their own scorecard, and what they wanted was signals and trends — not recalculated ratios. This was useful to learn early. We kept the feature but stopped investing heavily in it and redirected toward what was actually helping people.

Security came up in every session within five minutes.

Without exception. Board materials contain M&A targets, financial projections, CEO performance discussions, strategic plans that aren't public. Directors have fiduciary duties around confidentiality. The security architecture decision — Australian hosting, no offshore processing, no external model training, founder-only access — went from being a design principle to being the thing that made the product viable at all for this audience.

The three-phase testing approach.

Phase 1 — Baseline: watch them do the job.

Phase 2 — Controlled demo: me driving.

Phase 3 — User control: them driving.

What the sessions taught me.

The signals feature landed immediately.

The cross-pack search was the sleeper hit.

The financials feature was less compelling than I expected.

Security came up in every session within five minutes.

The Problem I Lived.

What It Actually Takes to Build This Properly.

If you've got a problem this kind of thinking would help solve —
let's talk.

Testing the Hypothesis

The three-phase testing approach.

Phase 1 — Baseline: watch them do the job.

Phase 2 — Controlled demo: me driving.

Phase 3 — User control: them driving.

What the sessions taught me.

The signals feature landed immediately.

The cross-pack search was the sleeper hit.

The financials feature was less compelling than I expected.

Security came up in every session within five minutes.

The Problem I Lived.

What It Actually Takes to Build This Properly.

If you've got a problem this kind of thinking would help solve —let's talk.

If you've got a problem this kind of thinking would help solve —
let's talk.