Honestly, I probably went overkill with the tech.
I may have found more success being quicker to market with a product that just wrapped asking ChatGPT “so what do you think about my startup idea”, but those tools already exist and aren’t data you can rely on.
As an engineer, I had a lot of fun building Validate My SaaS, and also learned some great lessons. So here they are!
Context Validate My SaaS offers a Competitor Analysis report for a user based on their startup idea. It looks something like this:
the user fills out a form describing their startup idea. The report generation takes about an hour and we send them an email once it’s ready.
On the backend, we do live web scraping to find the most current and relevant competitors based on the startup idea for the relevant competitors, we extract and format a bunch of data like features, pricing, popularity metrics, trust pilot reviews, etc.
For each report we scrape about 500 web pages and process about 1 Million tokens with an LLM, to cut through the noise and present our customers with a thorough, data-rich report about their relevant competitors.
Minimize the Context Even if an LLM has a large context window, too much context can distract and decrease the quality of the response. Not to mention increasing both the cost and time to process.
A more targeted example: if a chunk of text does not contain any numbers or currency symbols, it’s unlikely it contains information about a product’s price.
There’s a lot you can do here with simple regex. We’re able to cut 50% of tokens out of a web page before sending it to an LLM.
Best of Both Worlds / Multi-pass Filtering Some models are fast, cheap, and dumb. Others are more accurate, but you pay for that accuracy (with both speed and money). What happens when you need both?
One of the most critical parts of our pipeline is accurately gauging the relevance of a competitor. If it’s not accurate, it drowns our customer in irrelevant data.
So we developed a multi-pass filtering process. GPT 3 does NOT provide the accuracy we need. But, it IS accurate within some order of magnitude. So we first send a product through GPT 3 to get a sense for its relevance. If it’s not even close, we can already filter those out. We then process again with GPT_4 for a more accurate relevance. This does mean we process some products 2x, but most products don’t survive the GPT_3 stage so their cost is cut 10x.
Scores > Decisions Concisely, I’ve found it much more helpful to have the LLM give something a score (ex from 0–100) rather than a binary decision (is this a relevant competitor or not). This gives you much more control over the cut-off threshold and opens up a lot of other possibilities such as taking the top-most scoring items, or combining scores across different axes.
GAR > RAG Shortly after starting work on VMS, RAG (Retrieval Augmented Generation) became very popular. Especially thanks to Perplexity.
Validate My SaaS does not use RAG, but something even better for our use case: Generation Augmented Retrieval. This allows us to find new or updated products that LLMs have NOT been trained on. To take it to 10x, we also use adaptive-GAR, where our retrieval engine adapts its search based on the success of its results.
Hopefully some of these techniques we developed are helpful with your own product. If you are still in the early stages of your product and want to understand the competitive landscape BEFORE you build, of course check out Validate My Saas. Or if you just want to discuss what you’re working on and get some pointers for working with LLMs, just reach out.