Behind the Stack: Inside ComplyAdvantage’s Tech DNA with Mark Watson

We’re back with the third episode of Behind the Stack, where we sat down with Mark Watson, CPTO of ComplyAdvantage, the leading SaaS risk intelligence platform used by thousands of enterprises across 75 countries.
In this episode, Mark takes us deep into the architectural and cultural decisions behind building an AI-native compliance platform at global scale, from processing billions of events a day with sub-second latency, to deploying explainable, production-grade AI in one of the world’s most regulated industries. It’s a candid conversation about how knowledge graphs and agents are reshaping financial crime detection and what it takes to balance innovation, governance, and speed when failure isn’t an option.
Interview Topics Covered
1. Technology & AI Vision
2. Scaling AI & Machine Learning in Production
3. Data Infrastructure & Real-Time Intelligence
4. Engineering Culture & Organization Design
5. Supporting Global Customers & Regulatory Complexity
6. The Future of RegTech & AI
7. Founder DNA & Company Evolution
8. Leadership & Team
1. Technology & AI Vision
ComplyAdvantage has positioned itself as a leader in AI-native financial crime detection. How would you describe the company’s overall technology philosophy today?
We've been at this for over a decade now, integrating machine learning into the platform long before AI became the topic everyone wanted to discuss. That matters because you can't just bolt AI onto a broken architecture and expect it to work, you need to own the whole stack. We ingest our own data, process it through our own pipeline, and wire it together in a knowledge graph that continually determines how entities relate to each other. Our ingestion pipeline picks up sanctions changes in under a minute, with full availability for screening within hours. When we need to change how we classify adverse media, we can move dramatically faster than a traditional model retraining would allow – though in practice, we thoroughly test before rolling anything out. And because we control that whole chain, we can explain any decision from end to end. One audit trail, not a dozen systems our clients have to stitch together when the regulator comes knocking.
That integration also means we can run at a proper scale. We're processing three and a half billion Kafka messages a day, which puts us among the top four or five Kafka deployments in Europe, and we're still achieving sub-second response times for real-time payment screening. But scale on its own is just plumbing. What makes it worthwhile is that the same AI running our knowledge graph also powers the applications: analysts can write detection rules in plain English, our remediation agents deal with up to 85% of false positives without a human getting involved, and the whole thing learns from client feedback so it gets sharper over time. We call it AI-native not because we're chasing a trend, but because intelligence has been the point of the architecture from day one.
What does “AI-first” actually mean inside ComplyAdvantage from a product and engineering perspective?
It means AI isn't a feature we've added, it's how the platform thinks. To get slightly technical: we run a knowledge graph that ingests millions of data points daily and continuously infers relationships between entities. Large language models (LLMs) power that graph, we use Gemini in production, and it's doing things like linking unresolved identifiers to adverse media at the point of ingestion, before the data even lands in the graph. So by the time an analyst sees a case, the intelligence is already baked in.
From a product perspective, that shows up everywhere. Our transaction monitoring allows compliance teams to write detection rules in plain English. They describe what they want, the system shows them what it understands, including the underlying logic, and they confirm before it goes live. No developer in the loop, no three-week ticket queue. Our auto-remediation agent is integrated into the case management workflow, functioning as if it were a member of the analyst team. It identifies low-risk cases, remediates them, and escalates when confidence levels drop below a threshold set by the client. Currently, that's handling up to 85% of false positives, which is where most of the operational costs sit for our clients. And everything the agent decides gets written to the audit log with full reasoning, so it's as defensible as a human decision.
We use AI internally too, to augment our own development velocity, but that's another article in its own right.
What do you see as the next frontier in AI for AML, fraud prevention, and risk intelligence?
The honest answer is that the frontier isn't one big leap, it's a series of steps, and the pace has to match what enterprise risk functions are actually ready to adopt. These are naturally risk-averse organisations, it's in the name, so we're not going to dump a fully autonomous system on them and expect everyone to be comfortable with it overnight.
That said, there are some clear directions. The first is agents that can do more than triage. Currently, our auto-remediation handles the majority of false positives; however, we're moving toward agents that can file regulatory reports, gather supporting evidence, and ultimately operate beyond the edge of our SaaS platform, running on client infrastructure while feeding intelligence back to our knowledge graph. That's a significant shift: from AI as a tool inside your compliance workflow to AI as a participant in it.
The second is applying retrieval augmented generation (RAG), which is a technique that allows a language model to pull in relevant information from external sources before reasoning about a problem. RAG has been around for some time, but for us, that means leveraging RAG in client-facing agents to safely combine the knowledge buried in private client data with the connected relationships between 23 million entities in our knowledge graph. This synthesis of intelligent client-facing agents, a rich shared financial crime knowledge graph, and private client context goes beyond pattern-matching against static rules to provide the reasoning brain behind agentic decisions. That's a significant upgrade in the sophistication of these systems.
Further out, we're looking at predictive risk scoring, agents that take action before a risk fully materialises rather than reacting after the fact, and federated learning, where we can combine insights across our client base without any single client's data leaving their control. The end state we're working toward is something like 95% automation with near-zero false positives. However, we'll get there step by step, and we'll bring clients with us rather than dragging them along.
2. Scaling AI & Machine Learning in Production
ComplyAdvantage’s AI models are core to its product. How do you balance model accuracy, explainability, and operational performance?
There's a real tension here, and anyone who tells you otherwise is probably trying to sell you something. More sophisticated models tend to be more complicated to explain. Explainability sometimes comes at a performance cost. And the more powerful the model, the more expensive it is to run, which matters when you're processing billions of data points daily.
The way we navigate it starts with a recognition that explainability isn't optional. Our clients are regulated institutions. When a regulator asks why a particular decision was made, "the model said so" isn't an acceptable answer. So we've designed the platform so that every decision, whether made by a human analyst or by one of our agents, writes full reasoning to an immutable audit log. One trail, one explanation, defensible end to end. That's a hard requirement, and we build around it rather than bolting it on afterwards.
On the accuracy versus performance side, we're explicit about the trade-offs. Take payment screening: we're achieving sub-second response times on a knowledge graph with over 23 million entities, and there's a constant balancing act between latency, cost, and the quality of matches, which we express as the true positive to false positive ratio. We use in-memory caching and delayed writes to achieve speed, but the search algorithm itself is designed to optimize across all three dimensions rather than just throwing hardware at the problem.
And on cost, we're running Gemini in production for our LLM workloads, but we have parallel data science teams continually looking at where we can optimize. Gemma, instead of Gemini, where the task doesn't need the heavier model, that sort of thing. It's not glamorous, but it's what enables us to keep the platform economically viable at scale without compromising on intelligence.
Fraud and AML systems often suffer from high false positives. What innovations have been most effective in reducing noise while maintaining detection quality?
False positives are the tax that compliance teams pay for caution, and historically, it's been a brutal tax. Our clients tell us their analysts can spend a third of their day investigating alerts that turn out to be nothing. That's not just inefficient, it's corrosive. People stop trusting the system, they get fatigued, and the genuine risks start slipping through the noise.
The most effective thing we've done is close the feedback loop. When our clients screen their customers against our knowledge graph, they tell us whether the matches we return are true positives or false positives. That feedback trains a model that scores future search results, essentially learning what a good match looks like for that client's specific context. Clients then set their own threshold: how much false positive noise are they willing to tolerate in exchange for not missing genuine hits? Different clients have different risk appetites, so giving them that control matters.
The second piece is the auto-remediation agent. It sits in the case management workflow alongside the human analysts, picks up the low-risk cases, and remediates them – or escalates when it's not confident enough. That's currently handling up to 85% of false positives without human involvement. The key is that clients set the rules governing when the agent should give up and pass to a person, so they stay in control. And everything the agent does gets logged with full reasoning, so it's auditable.
The third is upstream data quality. Many false positives result from poor matching, two entities that appear similar but aren't the same person. We run probabilistic entity resolution across over 100 attributes before any information is added to the knowledge graph, with safeguards against both over-merging and under-merging. If you get that wrong, you're fighting noise at every subsequent stage.
How do you think about model governance, monitoring, and lifecycle management at scale?
We take this seriously, partly because we have to, our clients are regulated institutions who may need to explain our models to their regulators, and partly because at the scale we operate, an ungoverned model is a liability waiting to happen.
The governance structure centres on a Model Review Board that oversees the full lifecycle. Every new model requires a model requirements document, essentially a Product Requirements Document (PRD) for models, that undergoes formal approval before it reaches production. That document captures versioning, performance requirements, risk assessment, and monitoring plans. We also commission independent third-party validation through ARC, benchmarked against standards like NYDFS 504 and Office of the Comptroller of the Currency (OCC) guidance. Clients regularly request this documentation or for us to walk them through detailed audit questionnaires. We expect more requests around ISO 42001 and BCBS 239 as these frameworks gain traction.
For ongoing monitoring, we maintain golden datasets, statistically representative samples curated by our data science team and validated by human annotators. We measure model outputs against these to detect drift, and we track cost and latency through automated Service Level Objective (SLO) monitoring that covers both data freshness and search response times at the 90th and 95th percentiles. The retraining trigger is either performance deterioration against those benchmarks or a change in requirements from the product side, we're not on a fixed schedule, because the right cadence depends on the risk profile of the model.
Rollouts are where it gets interesting at scale. We can't always deploy model changes gradually per client, if you update an adverse media classifier, that affects the entire knowledge graph and every client downstream. We test exhaustively against representative samples, analyze the downstream impact on monitoring events, and, in some cases, throttle by switching ingested sources incrementally. For higher-risk changes, we'll run shadow mode, although the scale of the platform makes it expensive, so it's reserved for cases where the residual risk is genuinely high. We're also building features that give clients more control over how they receive monitoring updates, allowing them to manage the impact on their own terms.
Accountability rests with me as CPTO, I'm the named executive in our AI Policy and the Model Review Board's terms of reference. Day-to-day, our Director of Data Governance oversees the process, data science teams own the golden datasets, and ML engineers embedded in the product teams are responsible for implementation and monitoring.
Can you describe the architecture of your ML stack end-to-end, from data ingestion to model deployment?
Let me walk through adverse media as an example, because it touches most of the stack.
We ingest approximately 8 million articles daily from global sources. The first step is language-agnostic extraction, where we pull structured information out of unstructured text, regardless of the source language. We discard irrelevant content at the source, ensuring we don't pollute the pipeline with noise. That reduces the number to roughly 1.5 million deduplicated articles, of which approximately 300,000 are processed by our LLM classifiers. We use Gemini in production for this, classifying articles across 34 subcategories of adverse media risk. The old system employed traditional NLP models that required weeks or months to retrain when we needed to add categories or adjust the classification logic. The new system can be updated dramatically faster, although, in practice, we thoroughly test and assess the downstream impact before rolling out changes.
Once classified, we extract entity identifiers from the articles, such as names, organizations, and locations, and link them to entities in our knowledge graph before the data is added to the graph itself. That's important: the linkage happens at ingestion, not as a separate downstream process. The graph is built on Spark as a distributed processing engine, with proprietary algorithms handling the edge construction and relationship inference. It currently holds around 23 million entities and 39 million risks, continuously inferring new relationships, roughly 20,000 inferred facts per hour based on approximately 6,000 entities ingested.
The graph feeds our search and screening products. When a client screens a customer, they're querying this connected model, and we return candidate matches scored by a probability model trained on client feedback, their true positive and false positive decisions over time. That feedback loop means the system gets sharper the more it's used, and clients can set their own threshold for what match confidence they'll accept.
From a deployment standpoint, new models go through our Model Review Board's review process, which includes formal documentation, before they are deployed to production. We test against golden datasets, statistically representative samples curated by the data science team and validated by human annotators, and for significant changes, we analyze the downstream impact on client monitoring events before rolling them out. At scale, you can't always deploy gradually per client, so we throttle by switching ingested sources incrementally or running shadow mode for higher-risk changes.
How do you evaluate trade-offs between classic ML, deep learning, and newer LLM-based approaches within your risk models?
We've been integrating machine learning into the platform for ten years, so we've witnessed its evolution firsthand. Classic ML was what we had, and it worked, our entity resolution, our early adverse media classifiers, our match scoring. But as LLMs have matured, we've been deliberately migrating where it makes sense.
The adverse media pipeline is the clearest example. We replaced the entire classification system with Gemini-based models, and the difference is stark. The old NLP models took weeks or months to retrain when we needed to add a risk category or adjust how we classified articles. Now we can move dramatically faster, though in practice we test thoroughly before rolling anything out. We went from a handful of categories to 34 subcategories, and the new system is roughly twice as accurate. For our clients, that agility matters – they don't want to wait months for us to adapt to a new typology or regulatory expectation, and they'd rather not depend on having their own local subject matter experts to configure rules.
That said, it's not LLMs everywhere. For example, we still use proprietary ML models for generating match scores when searching our knowledge graph for screening and monitoring purposes. Our latency requirement for the model is sub-20ms for screening, and we process hundreds of millions of requests during any given monitoring run. LLMs would be too slow and costly for these use cases.
We're also thoughtful about which generation of model we use. When we build a system on a particular LLM, we test and govern it against that version. The commercial SaaS models keep refreshing, but if we're not using the new capabilities and haven't validated them through our governance process, there's no point in paying the premium. In fact, the open source models have now caught up to – and in some cases overtaken – the versions we originally built against, at a fraction of the cost. We're actively investigating that as an option.
The honest answer is that it will always be a mix. We index primarily on innovation – what enables us to move fastest and provide clients with the most adaptable system – but we're pragmatic about where classic approaches still have their place. We're not chasing the newest model for its own sake.
3. Data Infrastructure & Real-Time Intelligence
What does ComplyAdvantage’s data infrastructure look like behind the scenes? What architectural decisions were crucial in enabling real-time risk insights?
I've already discussed how scale and resilience are important to us, as well as the levels at which we operate – very high scale, a high degree of resilience, global distribution, and so on. Some applications also require running at extremely low latency levels. However, our biggest concern regarding data infrastructure is data quality. If your underlying data is stale or poorly connected, being large and fast just means you're delivering a lot of bad answers quickly.
The core of the stack is Kafka and Spark, with Yugabyte as our distributed database for medium to long-term storage and retrieval. Kafka provides us with the event-driven backbone – everything flows through it, from raw data ingestion to the applications. Spark handles the heavy lifting for the knowledge graph, running our proprietary algorithms for entity resolution and relationship inference. Yugabyte provides us with the resilience and geographic distribution we need without compromising consistency.
The architectural decision that unlocked a lot was centralizing the data platform in a single region, Brussels, and then replicating to client-facing regions via Kafka. That means we have one canonical source of truth for the knowledge graph, which is continuously enriched and inferred, and that gets pushed out to wherever our clients are located. We're not running multiple copies of the graph that might drift out of sync. When a sanctions list updates, it is propagated through the central data region and disseminated globally. When we infer a new relationship between entities, every region sees it.
The platform runs on Kubernetes with an Istio service mesh managing communication between microservices. Ingress is handled through Cloudflare, with authentication provided via Auth0. We're a multi-cloud provider, utilizing GCP and AWS, depending on the region, which provides redundancy and enables us to meet client requirements regarding data residency. The entire system is ISO 27001 certified and SOC 2 Type II compliant, as at this scale, security and resilience cannot be treated as afterthoughts.
How do you handle ingestion, enrichment, and classification pipelines at the scale required for global AML detection?
The starting point is that we own our data end-to-end. Unlike most competitors, who license third-party lists and pass them through, we ingest data directly from sources, including sanctions lists, PEP registers, watchlists, corporate registries, and media sources globally. That control is what lets us move fast: our ingestion pipeline picks up sanctions changes in under a minute, with full availability for screening within hours. Most of the industry is still measuring that in days.
Specifically, for adverse media, we ingest around 8 million articles per day. Language-agnostic extraction extracts structured information regardless of the source language, and we filter out irrelevant content at the point of ingestion, ensuring the pipeline is not polluted. This brings us to approximately 1.5 million deduplicated articles, of which roughly 300,000 are processed by our LLM classifiers, which we utilize for categorization across 34 risk subcategories. The classification system can be updated dramatically faster than traditional NLP models would allow; however, in practice, we thoroughly test and assess the downstream impact before rolling out changes.
The enrichment layer is where the knowledge graph comes in. We're not just storing entities in isolation – we're connecting them. Proprietary algorithms run entity resolution across 100-plus attributes, with safeguards against both over-merging and under-merging. The graph currently contains around 23 million entities and 39 million risks, and the inference engine is producing roughly 20,000 new facts per hour – relationships we've discovered between entities that weren't explicitly stated in the source data. Relatives and close associates, as well as corporate connections, were exposed through adverse media. Continuous enrichment is what transforms a static database into one that actually understands risk.
We also have a Data Quality function, which includes human analysts who review and annotate outputs on an ongoing basis. They're validating that the models are performing as expected, creating labeled datasets that we use to measure drift, and catching edge cases that the automation misses. At this scale, you can't rely purely on machines, but you also can't rely purely on humans. The trick is knowing where each adds the most value.
Are there emerging data modalities or signals you’re excited to integrate in the future?
The most significant shift we're working toward is bringing client data into the intelligence graph in a privacy-protected way. Currently, clients search our graph and indicate whether matches are true positives or false positives – this feedback already trains our models and improves match accuracy over time. But there's a limit to what we can learn from search outcomes alone. If clients can contribute their own risk-relevant data back into the graph via API, without compromising privacy, the whole system becomes smarter. We're actively building that capability.
Further out, we're exploring autonomous agents that can independently discover and assess new sources of risk information. Rather than us deciding which data sources to integrate, the agents would seek out potentially relevant information, evaluate it, and feed it back to enrich the graph. The intelligence graph becomes the reasoning brain, and the agents become its eyes and ears – continuously expanding what we know and how we know it.
Ultimately, it all comes down to agents and graphs communicating in an ever-expanding loop. The graph holds the connected intelligence. The agents act on it, learn from it, and contribute back to it. The more that loop runs, the richer the understanding becomes. That's the direction we're heading – not just more data, but a system that's genuinely learning and adapting on its own.
4. Engineering Culture & Organization Design
What characteristics define a great engineer in a company tackling global financial crime?
The short answer is curiosity. Curiosity about the craft, about how systems behave at scale, about what's possible with new technology – particularly AI and data engineering right now. If someone has that curiosity and the technical fundamentals, the domain expertise follows. We're not looking for individuals who already have a thorough understanding of AML regulations. We're seeking individuals who are fascinated by challenging problems and eager to improve their ability to solve them.
Beyond that, it's about wanting to be part of a high-performing team. We've built a culture that's deliberately collegiate – one that combines hybrid working with a minimum of two days in the office, regular hackathons, pizza and board game evenings (I’m currently personally immersed in a very longstanding Gloomhaven campaign), and the whole organisation coming together twice a year for an onsite event. At ComplyAdvantage, people enjoy working with others. That matters because our squads are typically seven or eight people spread across London, Lisbon, and Prague, so you need engineers who thrive on collaboration rather than just tolerating it.
We also look for what I'd call an obsession with improvement – a desire to self-actualise, to be the best they can be, to advance their career. Engineering managers spend roughly a third of their time on people, a third on technology decisions, and a third on iterating team output. I'm a big believer in Andy Grove's "High Output Management" (if you haven’t read it, do yourself a favor) – management is, above all, a leverage role; it’s about getting the most out of the team. That philosophy runs through everything we do.
Underpinning all of this is a heavy emphasis on metrics. We closely measure engineering activity – how we're spending our time, whether we're investing where we intend to, and where we can improve. We have a dedicated technology operations function that relentlessly iterates on productivity. To call in another classic management guru, Drucker had it right: you can't manage what you don't measure. And that discipline extends to product management as well – you can't have engineering running at full speed with the product unable to keep pace. It's a single culture throughout, despite the differences in disciplines.
The RegTech domain itself demands scale, resilience, and continuous innovation. We process billions of messages a day, and we're expected to be always-on for regulatory purposes. The underlying platform must absorb all of that, allowing product teams to move quickly without worrying about it. Getting the engineering basics right is what lets you innovate at speed – the velocity comes from having invested in the foundations, not from cutting corners on them.
How do you create an engineering culture that supports rapid experimentation while staying compliant with stringent regulations?
It starts with recognising that speed and rigour aren't opposites, they're complementary. The teams that move fastest are usually those with the strongest foundations: a good testing discipline, comprehensive metrics, and clear governance. If you have to worry about whether a change will break something in production, you slow down. If you trust the guardrails, you can run.
We create deliberate space for experimentation. Hackathons are a regular fixture – they're where new ideas get prototyped and where engineers can explore things outside their usual remit. We also structure the organisation around different time horizons: engineering squads typically work on six-week cycles, while data science operates on longer horizons of six months or more. That means we can iterate quickly on product and platform while giving the more research-oriented work the time it needs to mature properly.
On the governance side, we have a Model Review Board that oversees all aspects involving models. New models require formal documentation, third-party validation against standards such as NYDFS 504 and OCC guidance, and sign-off before they reach production. That might sound heavy, but it actually liberates the teams. They know the boundaries, they know what needs approval and what doesn't, and they can move quickly within those constraints.
The cultural piece is just as important. Psychological safety matters, people need to feel they can try things without being punished if they don't work out. But that's balanced with strong accountability. We see very few bugs in production because we test relentlessly. My leadership team and I meet every Friday to review any production incidents in detail – not to assign blame, but to understand what happened and how we can prevent it from recurring. That combination of safety and discipline is what lets you experiment without creating risk for clients.
As AI plays a more central role, how are you upskilling teams or evolving internal capabilities?
We've taken a deliberate and measured approach, which is very much in keeping with how we do everything. Every engineer now has access to AI coding assistants, such as Google's CodeAssist and Anthropic's Claude Code, and we've tracked the impact rigorously. Before rolling out broadly, we ran controlled comparisons between teams working with and without AI. The results have been striking: commits per developer have doubled over the last six months, and most other velocity metrics are up by at least 50%. We've worked directly with Google and Amazon on enabling AI internally, and, in fact, and as I write this, we have a team-wide hackathon scheduled for next week centred on Google's Gemini Enterprise Plus (the product formerly known as AgentSpac, to be honest, I can’t really keep track of Google’s product names).
We're not hiring AI specialists externally, except for data scientists. The culture of curiosity I mentioned earlier means people are genuinely enthusiastic about acquiring these skills – with the appropriate skepticism. We measure everything; we don't just adopt things because they're fashionable. Prompt engineering is a new skill we're actively developing across the team, it's easily trainable but genuinely important. Model governance, as I've discussed, gets serious attention through our Model Review Board. MLOps is an area where our data science team is currently focusing.
This extends well beyond engineering. There's no point having an engineering team that's doubled its velocity if product management can't keep pace, it's like chaining them to the rear bumper of a Ferrari and driving off. They need to operate at the same speed, so they have the space to do what matters most for the product, speaking to customers and understanding their needs. If anything, I think product management is going to change more than engineering with the advent of AI. It's a profession highly dependent on written artefacts, PRDs, strategy documents, customer communications, and that's precisely where LLMs excel. Having a combined technology, data, and product organisation means we can inculcate a common culture and move in the same direction together.
5. Supporting Global Customers & Regulatory Complexity
ComplyAdvantage operates across multiple regulatory environments. How do you build systems that meet regional requirements while maintaining a unified global platform?
The architecture was designed from the start to handle this. We operate a multi-tenant platform with a distributed presence across various geographic regions, including client-facing locations in Dublin, London, Paris, the US, Canada, Singapore, Australia, and India. The platform runs on a mix of AWS and GCP, depending on the region. The data platform itself is centrally located in Brussels. It replicates to the client-facing areas via Kafka, ensuring we maintain a single, canonical source of truth for the knowledge graph while meeting data residency requirements.
On the product side, the core platform is unified; however, regulatory reporting varies by jurisdiction. We support US requirements, such as SAR filing and FinCEN, as well as Canadian requirements through FINTRAC, where a direct integration is necessary for the system to function. Additionally, we track regulatory changes across the markets we serve. The underlying screening and monitoring capabilities are the same globally, but the outputs adapt to local requirements.
We also invest in third-party validation to demonstrate we meet regional standards. We have ARC validation reports benchmarked against NYDFS 504 and OCC guidance for our screening products, and the platform itself is ISO 27001 certified and SOC 2 Type II compliant. As regulatory expectations evolve, and we're anticipating more client requests around ISO 42001 and BCBS 239, we'll continue to add to that.
The principle is: one platform, one knowledge graph, one set of AI capabilities, but flexible enough to meet clients wherever they are, both geographically and regulatorily.
What have you learned from supporting high-growth fintechs compared to large banks or enterprises?
We have over 1,500 direct clients and numerous partners who license our platform. The range is broader than you might expect – yes, FinTechs, but also any corporation whose economics could expose them to money laundering or fraud. GoDaddy, for example, because domain exchange involves a monetary transfer. AJ Bell is a traditional UK investment house. Monex, where cross-border remittance needs rigorous sanctions checking. The company began selling to other fintech startups, resulting in a long tail of smaller customers. But the technology has become powerful enough to serve very large organisations now.
The main difference isn't really about what they need from the product, the platform is flexible enough that we don't see clear patterns by customer size. It's more about the procurement structure. Larger organisations tend to have long-standing group decisions, and departments within them can become trapped by internal policies made years ago – sometimes so long ago that the original reasoning has been forgotten. Our job in those situations is to demonstrate that we can out-innovate any competitor and help our internal champions succeed. Effectively, achieve escape velocity from whatever legacy decisions might be in place. That takes time, but we believe we can out-innovate anyone in this space, with demonstrable scale and reliability.
The bigger lesson is that the more conservative organisations must necessarily innovate or they'll lose margin. Look at the UK, Revolut, Monzo, and Starling have been able to penetrate the sector precisely because they're more agile. We sell that agility as a product. If a large bank can screen and onboard customers as efficiently as a challenger, it removes a competitive disadvantage. That's the real value proposition for enterprises, not that they're slow, but that they can move faster than their current infrastructure allows if they adopt the right platform.
How does customer feedback shape your product roadmap and architectural decisions?
We have multiple formal channels for this, and I pay close attention to all of them. Customer success interacts actively with every direct client and acts as a strong voice of the customer within the company. I might be several levels above any customer success representative in the hierarchy, but I want to hear anything they have to say before I hear anything else from the organisation. They also run a customer council, I occasionally appear when invited, but the transcripts are always my favourite reading.
Beyond that, we ask the product team to run several types of formal customer interaction. Product feedback sessions to understand what clients like and dislike about what we've built. Roadmap feedback to test whether our plans resonate. And most importantly, true discovery, open-ended conversations with customers, prospects, even other people's customers. Not about our product, but about their roles, their problems, their frustrations. Where are their biggest pain points? What would genuinely move the needle for them? That's where the real insights come from. And then we have hundreds of eyes on the market through sales, that's how we find near-term leverage and impact.
How this shapes decisions depends on where you are in the stack. Our AI and data roadmap is largely top-down – it needs to be ahead of the market. The platform is driven by the requirements of resilience, performance, and security. But for the applications we sell, customer and market feedback are central.
There's also a different kind of feedback that's worth mentioning, the continuous loop where customer decisions feed intelligence back into the product. When clients inform us whether matches are true or false positives, that helps train our models and improves accuracy over time. So feedback isn't just shaping what we build next; it's actively making the current product smarter.
One example that really shaped our thinking: our agentic remediation. We could have made it completely automated, invisible under the covers. However, customer feedback was clear: they currently want AI agents to augment their analyst teams, not replace them. So our agents sit alongside real people as helpers. Customers name them, one of our first clients called theirs Sergei and Pepe … “Let’s hand this one to Sergei”. That insight – that AI at this stage needs to be anthropomorphised to become acceptable, was crucial. As agents become more powerful, maybe that changes. If your entire team is made up of robots, they obviously don't need human names to communicate with one another. But right now, the technology can't outrun the human organisation. It needs to work alongside it.
We also run design partner programmes for new capabilities. They help us understand outcomes, we might want to say our agents remediate a certain percentage of false positives, but if false positives aren't actually the most important dimension, design partners tell us that. Those programmes enable us to go to market with real data on performance ranges, and the design partner gains a head start and genuine influence on where the product goes.
6. The Future of RegTech & AI
What advancements in generative AI or predictive modeling do you think will have the biggest impact on the compliance world in the next 3–5 years?
There are numerous opportunities here, making it difficult to know where to start. Multimodal capabilities that can process not just text but images, scanned documents, audio from calls, a lot of compliance evidence sits in formats that traditional systems can't parse. AI that can synthesise across multiple sources and build a narrative, not just "this entity appears on a sanctions list" but "here's why this pattern of transactions, combined with this adverse media, combined with this corporate structure, suggests a concern". Calibrated confidence, AI that knows what it doesn't know and can express uncertainty appropriately, which is critical when overconfidence could mean missed risks. Industry-wide pattern detection through federated learning, spotting criminal networks that span multiple institutions without any single institution having the full picture. The potential for AI to help less experienced analysts learn by explaining its reasoning, building domain expertise, rather than just doing the work for them.
But the things that excite me most are probably twofold.
First, RAG and its derivatives. The ability for our knowledge graph to become the reasoning brain behind agentic decisions. I think we've underplayed the power of our intelligence graph, it currently drives our data scoring, matching, and relationship inference. However, the opportunity lies in two directions: to power the entire compliance workflow, and to increase the amount of data it consumes exponentially. We're talking about making that reasoning brain a thousand times more powerful than it is today. When an agent makes a decision, it's not pattern-matching against static rules, it's drawing on a connected understanding of millions of entities and their relationships, enriched by everything we've learned across our entire client base.
Second, the ability to deploy containerised agents, and I should say, "agents" here really means a dynamically interacting collection of what we currently think of as single agents – independently, into customer systems, with guarantees on data privacy, to seek out new sources of risk and take action predictively rather than reactively. If you break that down, there are several new technical capabilities bundled together: multiple interacting agents, agent autonomy, data communication privacy, and predictive rather than reactive capabilities. I think we'll see all of them emerge over the next 12 to 18 months. The shift from reactive compliance, something happened, we detected it, we reported it – to predictive compliance, we can see this risk emerging and take action before it materialises, is genuinely transformative for the industry.
What’s a misconception about AI in compliance that you’d like to correct?
That it's not real yet, it's a collection of experimental projects, proof-of-concepts, and interesting demos, but nothing is actually in production. Every time I do an interview or go on a podcast, I see genuine astonishment that we have AI deployed at scale. I'll be on a panel discussing how we process 8 million articles a day through Gemini, and the other participants will share interesting tricks they've discovered with ChatGPT. It's an entirely different conversation.
We're live. Every day, LLMs are classifying adverse media across 34 risk categories. Our knowledge graph is continuously inferring relationships between 23 million entities. Our auto-remediation agents are handling up to 85% of false positives in production. Natural language rule creation is in the hands of the customers. This isn't a roadmap slide, it's what's running right now.
And we're making more use of AI every week. The capabilities are compounding. The misconception that AI in compliance is still experimental, that's probably at least 12-18 months out of date by now. The question isn't whether AI works in this domain; the question is whether it works in this domain. It's how fast you can adopt it before your competitors do.
7. Founder DNA & Company Evolution
What’s an important architectural or product decision the team made early on that turned out to be a long-term strategic advantage?
Although the company is over ten years old, I've only been here for four, so I'll speak to my own experience.
When I arrived in early 2022, we had two separate products, Customer Screening and Monitoring, on one technology stack, and Transaction Monitoring and Payment Screening, on another. That wasn't great for our customers or us. It was clear that we needed to combine them, which created an opportunity to build an entirely new platform with data science, machine learning, and strong data engineering at its core.
The company had always curated its own data, which was part of the original founding vision, so we already had strong data skills and culture. It followed naturally that we should double down on that. Hence, the knowledge graph. Hence, our rapid introduction of generative AI when that became viable. And alongside that, we knew we needed rigorous engineering at scale, centralised around data, Kafka as the backbone, API-first design, data replication from a central region to client-facing locations. Data and ML were the key decisions, and they have subsequently led us in several directions that have turned out to be strategic advantages.
I won't pretend replatforming isn't risky, for a scale-up, it's often both inescapable and problematic at the same time. You're rebuilding something you already have rather than something new. That requires patience from all sides. Once you've built it, you have to migrate, first the applications, then the installed customer base. Because migration takes time, you end up maintaining and extending two platforms simultaneously until the new one fully supersedes the old. Any replatforming is a risk. But not replatforming is also a risk, especially after eight years, and the longer you leave it, the less agile you become.
The decision to go data-first and build everything around that foundation is what lets us move as fast as we do now. I'd rather not do another replatforming for a while, though, thanks very much.
How do you maintain focus and innovation as the company grows and the product surface expands?
It comes down to organisational discipline, metrics, and being honest about the tensions.
On discipline, we carefully allocate our budget across different types of work, including strategic initiatives, sales-driven requests, existing customer needs, and technical maintenance. Until now, we've managed mainly the engineering budget as tech versus features, but now that we're out of replatforming, we're starting to allocate separate budgets for each input type. Having relatively large engineering squads, seven or eight people, gives us the flexibility to do this, and our tribe structure lets us vary focus areas between platform, data, and the products we sell. Different parts of the organisation have different priorities, and that's fine.
On metrics: we track rigorously. We use a tool called Jellyfish that collects data from GitLab and Jira to attribute work to epics. Through custom reports from our technology operations team, we can identify distractions that product managers didn't want to work on, as well as over- or under-allocations against the feature budget. Additionally, we can attribute the cost of feature development and maintenance to our business case. That lets us iterate relentlessly on delivery velocity. Common development cycles and cadences across the organisation mean we're comparing apples to apples.
On saying no: there's constant pressure with over 1,500 clients and multiple products. However, if we've agreed on the budget for business-led requests or customer requests, we can ask the commercial side to help prioritize. It becomes a problem-solving exercise rather than a negotiation. Everyone here is collaborative – from commercial to executive, finance, tech, product, and data – which helps.
And yes, there's always tension between serving existing clients and building new capabilities, between quick wins and long-term bets. You have to recognise that, strategise around it, use the budgets as guardrails, and have a clear escalation process for when you need to break outside them – they're not laws of physics. But that's all part of running a business, isn't it?
8. Leadership & Team
What organizational structures have worked best for scaling a global engineering team while keeping decision-making fast and decentralized?
I've touched on many of these points already, so I'll keep this brief.
We organise into squads of seven or eight people, larger than was conventional before COVID, but we've found that this size provides redundancy and keeps teams at a critical mass. Each squad has an engineering manager and an assigned product manager. Squads are grouped into three tribes: platform and infrastructure, data, and the risk applications we sell. Engineering managers spend roughly a third of their time on people, a third on technology decisions, and a third on iterating team output, it's a leverage role, not a coordination bottleneck.
The engineering hubs are located in London, Lisbon, and Prague, with additional teams in Romania, focusing on data ingestion, annotation, and research. Every squad includes people from multiple locations, so we're deliberate about culture. We ensure a minimum of two days a week in the office for hybrid working, regular hackathons and social events at each location, and an entire organisation-wide bonding onsite twice a year.
Decision-making stays fast because squads have real autonomy over the how, the business provides the why. Common cadences and metrics let us stay aligned without constant coordination overhead. And the culture is genuinely collegiate, which means people solve problems together rather than escalating everything upward.
What practices work best for keeping alignment on research bets vs. production engineering priorities?
The key is explicitly structuring for different time horizons. Our engineering squads work on six-week cycles, that's the rhythm for feature delivery, iteration, and production priorities. Our data science team operates on a much longer horizon, typically six months or more. Those are deliberately different because the work is different.
Data science spends about 80% of its time on long-term research, effectively our AI roadmap, and 20% helping out the engineering teams with immediate needs. The engineering squads also contain machine learning engineers. Same skillset as data scientists, but they operate as engineers – shipping production code, working within the squad cadence. Formally separating those two roles was an important step. It took us a while to get to this structure, but it means research doesn't get constantly interrupted by production demands, and production teams have the ML expertise they need embedded directly.
For prioritisation, the Director of Data Science and I agree on a block-by-block basis what work to pursue (it also gets presented to the CEO, who gets weekly briefings of what’s going on right across my organisation). That's based on a combination of candidate requests from the squads and research bets aligned with exec, similar to how we divide product work between strategic initiatives and customer or sales-driven asks. The research bets are essentially our AI roadmap, and they need protection from short-term pressure, or they'd never get done.
Most of our next-generation data and probability scoring models started in data science, our entity resolution models, name commonness models, and the work that now powers the knowledge graph. That's the pipeline: research explores, proves out, and then hands over to engineering to productionise and scale. Getting that handover smooth is what makes the whole thing work.
What’s a leadership decision you made in a moment of uncertainty that ended up shaping ComplyAdvantage’s trajectory?
I've been running technology at ComplyAdvantage for four years, I only took on product at the start of 2025. The previous CPO and I started on the same day, 22nd January 2022, and we made a set of early decisions together that have shaped everything since.
The first was around organisational structure, the shape of the squads, the staffing model, the metrics we'd collect, the cadences we'd operate on. That sounds administrative, but it's foundational. Getting those basics right meant we had the machinery to execute everything that came afterwards.
The second was technology consolidation. I gathered the senior engineers in a room and asked them to rationalize our tech radar, significantly reducing the amount of technology proliferation across the organization. That was a harder conversation than it might sound. Engineers often have strong attachments to their tools. But the outcome has given us far more flexibility to move people between projects. When your entire backend is written in Kotlin, and your data layer is built in Python, you're not constantly retraining people or creating silos of expertise.
The third was the replatforming, the decision to rebuild around data and ML, combining two separate product stacks into a unified platform. I've talked about that already. It was risky, it required patience from everyone, and it's what lets us move as fast as we do now.
A big part of all three was the willingness to work very closely between engineering and product. That partnership was deliberate from day one, and hopefully it's prevailed to the present day.
How do you balance building for long-term resilience with the need to respond quickly to emerging global risks?
This is likely the thread that ties everything we've discussed together.
The short answer is that you don't balance them, you build one to enable the other. Long-term resilience enables you to respond quickly. If your foundations are shaky, every urgent response becomes a scramble. If they're solid, speed becomes the default.
That's been the logic behind most of our major decisions. The replatforming, combining two product stacks into a unified platform built around data and ML, was a long-term resilience bet. It took patience, it required maintaining two systems simultaneously, but it's what lets us now update sanctions data in under a minute, reprompt our adverse media classifiers in hours rather than months, and roll out new AI capabilities without re-architecting.
The knowledge graph, the Kafka backbone, the API-first design, and the geographic distribution with a single source of truth in Brussels, all of these are resilience infrastructure. But it's also what makes us fast. When a new sanctions regime emerges, we're not scrambling to integrate a new data source. When a client needs to adapt to a new typology, they're not waiting for us to retrain models.
The same principle applies to how we run the organisation. Common cadences, rigorous metrics, clear budget allocation between strategic work and immediate demands, that's resilience at the operational level. It means we can absorb urgent requests without derailing long-term plans, and we can pursue long-term bets without losing sight of what clients need today.
So I'd close with this: in our domain, resilience and responsiveness aren't in tension. They're the same thing, built at different timescales. Get the foundations right, and speed follows.
Conclusion
Mark’s perspective offers a rare look at what it takes to run real AI in production,as mission-critical infrastructure. His insights underline a recurring theme of Behind the Stack: that sustainable speed comes from strong foundations, clear accountability, and teams built to learn as fast as the technology itself.
In our next episode, we shift focus once again sitting down with another surprise visionary tech leader to explore how a different layer of the modern stack is being rebuilt for scale, resilience, and the next wave of AI-driven transformation. Hang tight, episode 4 is coming soon and if this conversation is any indication, the future of the stack is only getting more interesting.
Abonnez-vous à notre Newsletter