Case Studies - Nova Synthetic Data Lab

Featured Demonstrations

See how we successfully applied our
synthetic data technology to real-world datasets

AI · In Development

Fine-Tuning LLM Datasets for Security-First AI Agents

Building domain-specific synthetic training data for Adamant's on-device AI — engineered for information security mastery, precision tool calling, and ultra-low latency inference.

April 2026

Oncology Innovation

Cancer Research Dataset Demo

Demonstration of Nova Synthetic's ability to generate synthetic versions of existing cancer research datasets, preserving clinical patterns while protecting patient confidentiality.

November 2025

Financial Innovation

Financial Fraud Detection Demo

Demonstration showing how Nova Synthetic created synthetic financial transaction data from existing fraud datasets, enabling secure ML model training without exposing customer information.

September 2025

Healthcare Innovation

Diabetes Dataset Synthesis Demo

Demonstration of how Nova Synthetic successfully transformed a public diabetes research dataset into a privacy-preserving synthetic version while maintaining statistical integrity.

August 2025

Healthcare Research August 2025

Nova Synthetic: The Future of Synthetic Data in Healthcare and Beyond

Transforming data access while preserving privacy across industries

The Global Challenge

In today's world, access to quality information is an invaluable resource for science, medicine, and technological innovation. However, there is a challenge that transcends borders: How can we make the most of data without putting people's privacy at risk? This tension is not exclusive to Costa Rica or Latin America; it is a global challenge.

At Nova Synthetic, we have embraced this challenge as our mission. Our work focuses on creating synthetic datasets that preserve the statistical richness of real data while eliminating any link to specific individuals. In this way, institutions of all types can innovate, research, and generate solutions without exposing sensitive information.

A Key Breakthrough: The Diabetes 130 Case

We recently developed a synthetic dataset based on the historic Diabetes 130 – US Hospitals (1999–2008), comprising over 100,000 clinical records. The objective was clear: preserve the statistical fidelity of the original data while ensuring minimal risk of re-identification.

The results exceeded international standards:

Global Accuracy: 97.4%
Pattern Conservation: Univariate, bivariate, and trivariate patterns preserved, ensuring consistency in complex analyses
Solid Privacy: 0% literal matches and a re-identification risk comparable to chance

In other words, we have data with the same utility as the originals for research and development, but without the risk of compromising any patient's confidentiality.

Compliance and Trust as Pillars

Trust in data is not achieved solely through technical metrics. That's why every project at Nova Synthetic is designed to align with international regulatory frameworks such as HIPAA, GDPR, and ISO/IEC 27559, in addition to complying with local regulations in Latin America.

This approach allows us to deliver auditable, secure datasets ready to be used by hospitals, universities, insurance companies, startups, and research laboratories, within a framework of governance and responsibility.

Benefits Beyond Healthcare

The impact of synthetic data is not limited to the medical sector. These datasets open new opportunities in:

Artificial Intelligence: Training predictive models without compromising real information
Education and Training: Creating safe learning environments with realistic data
Financial and Insurance Industry: Risk analysis and fraud detection under full compliance
International Collaboration: Sharing knowledge between countries without privacy barriers

A Vision for Latin America and the World

From Costa Rica, Nova Synthetic seeks to demonstrate that it is possible to innovate responsibly. Our work is not just a technical contribution, but a commitment to a future where data becomes an engine of economic, social, and scientific progress.

We firmly believe that Latin America can lead this transformation. With reliable synthetic datasets, it is possible to boost public health research, strengthen the competitiveness of our institutions, and open the door to global collaborations that previously seemed unattainable.

At Nova Synthetic, we are convinced that synthetic data is not an alternative for the future, but a present tool that is already changing the way we research, innovate, and build prosperity for our communities.

97.4%

Global Accuracy

Literal Matches

100K+

Clinical Records

Compliance Standards

Financial Innovation September 2025

Nova Synthetic: Synthetic Data for a Safer Financial Future

Transforming fraud detection while protecting customer privacy in the financial sector

The Financial Fraud Challenge

Financial fraud is one of the greatest challenges of our time. Every year, thousands of people and organizations suffer losses from increasingly sophisticated fraudulent practices. However, researching and developing solutions against fraud presents an obvious difficulty: the real data containing these signals is usually highly sensitive and protected by strict regulations.

At Nova Synthetic, we believe this challenge should not hinder innovation. Our team has demonstrated that it is possible to create high-fidelity synthetic datasets that reproduce the statistical patterns of fraud, without including personal information from any customer. In this way, research and the financial industry can advance safely and responsibly.

A Breakthrough That Opens New Opportunities

Our most recent work focused on the international reference dataset Bank Account Fraud (BAF). The objective was clear: preserve the complexity and natural imbalance of bank fraud data, but under minimal reidentification risk.

The results speak for themselves:

Global Accuracy: 98.4%
Univariate, bivariate and trivariate fidelity: All above acceptance standards
Enhanced Privacy: Reidentification risk so low it approaches chance

In simple terms, this is ideal data for training and validating fraud detection models, with the peace of mind that no real customer is at stake.

Compliance and Trust

Security is not measured only in numbers. Every Nova Synthetic project is designed to comply with the most demanding regulations, including:

Law 8968 and regulations of Costa Rica, under the supervision of PRODHAB
CONASSIF Agreement 5-24 (2024), which regulates technology management and risks in the Costa Rican financial system
International frameworks such as GDPR (Recital 26), which recognize that anonymous data does not constitute personal information

This makes Nova Synthetic a reliable ally for banks, fintechs, insurance companies, and regulatory entities, both in Costa Rica and throughout Latin America.

Benefits for the Financial Industry

Synthetic data applied to fraud not only solves a technical problem but generates strategic advantages for the entire sector:

Safer Testing: Allows simulation of fraud scenarios in controlled environments without risk of information leakage
More Robust Models: Facilitates training of algorithms that learn to detect complex patterns, even in situations of extreme imbalance
Guaranteed Compliance: Ensures that innovation is carried out in line with local and international regulatory requirements
Greater Public Trust: Strengthens the perception of security and responsibility in financial institutions

A Path Toward Regional Prosperity

From Costa Rica, Nova Synthetic works so that Latin America positions itself as a reference in responsible innovation with data. The development of quality synthetic datasets not only drives the fight against fraud but opens the door to a more solid, secure, and competitive financial ecosystem.

We believe that the region's future depends on finding the balance between technology, ethics, and trust. Synthetic data is a key piece to achieve that balance and ensure that prosperity is shared.

At Nova Synthetic, we are convinced that synthetic data represents not just a technological solution, but a commitment to a future where financial innovation and customer protection go hand in hand, building a more secure and prosperous financial landscape for all.

98.4%

Global Accuracy

Reidentification Risk

Regulatory Frameworks

100%

Privacy Compliance

Oncology Innovation November 2025

Nova Synthetic: High-Quality Synthetic Data for Oncological Research

Advancing cancer research while protecting patient privacy through revolutionary synthetic data technology

The Oncological Research Challenge

Breast cancer is one of the greatest public health challenges in the world. Medical research requires high-quality data to discover patterns, test hypotheses, and develop more effective treatments. However, this data is usually extremely sensitive and protected by regulations that limit its access.

At Nova Synthetic, we believe that innovation and privacy should not be in conflict. That's why we have taken an important step: generating a high-quality synthetic dataset based on the renowned Rotterdam oncological dataset.

A Breakthrough That Opens Pathways

The result of this project is not just a set of numbers: it is proof that we can create realistic and reliable clinical datasets that preserve the statistical richness of the original data without exposing the identity of any patient.

Outstanding results:

Overall accuracy of 93.6%, ensuring that medical patterns are preserved in a robust way
Enhanced privacy: Reidentification risk reduced by more than 13% compared to previous versions
Scalability: We managed to triple the original dataset size, facilitating broader and more consistent analyses

In simple terms, this means that researchers and healthcare professionals can work with synthetic data that behaves like real data, but without compromising patient confidentiality.

Why This Matters for Cancer Research

Synthetic datasets like this enable researchers to:

Develop predictive models to anticipate recurrences and survival outcomes
Train AI algorithms for diagnosis and medical decision support
Share information between research teams without compromising privacy
Accelerate clinical discoveries by eliminating legal and ethical barriers that limit access to real data

In a field like oncology, where time can make the difference between life and death, this ability to generate and share reliable information opens new possibilities for global collaboration.

Technical Excellence in Implementation

Our advanced synthetic data generation pipeline utilized the SynthD system with MOSTLY AI SDK 4.7.8, featuring:

Dataset Processing: 2,982 samples with 16 medical variables from the Rotterdam breast cancer dataset
Advanced Training: 5 independent generators with 50 epochs each, totaling 33 minutes of optimized execution
Privacy Protection: Discriminator AUC of 0.5817, exceeding medical privacy standards
Quality Metrics: 98.19% univariate precision, 94.80% bivariate precision, 87.86% trivariate precision
Scalable Generation: Initial pool of 8,945 synthetic samples for optimal final selection

Innovation with Human Impact

Beyond the metrics, this achievement reflects a vision: data science in service of life. With each advance in synthetic data, we move closer to a future where:

Hospitals and universities in Latin America have access to international-level datasets
Researchers can validate hypotheses more quickly and safely
The region positions itself as a leader in responsible innovation, contributing to global scientific progress

A Shared Future

At Nova Synthetic, we know that the fight against cancer depends not only on medicine but also on the ability to generate knowledge from data. With this project, we demonstrate that it is possible to combine cutting-edge technology, ethical responsibility, and strategic vision to transform medical research.

This is one more step in our commitment: to make innovation protect and enhance human life, in Costa Rica, Latin America, and the world.

93.61%

Overall Accuracy

13%

Privacy Improvement

Dataset Scaling

2,982

Synthetic Records

AI Fine-Tuning April 2026 In Development

Fine-Tuning LLM Datasets for Security-First AI Agents

Building the training foundation for an AI that works on your device — and knows when something's wrong.

The Problem with Generic Models

Most language models are trained on the open web — broad, general, and optimized for surface-level helpfulness. That works for casual tasks. It doesn't work for Adamant.

Adamant is a native AI agent built to run on your device, execute complex multi-step tasks, and handle real work in security-sensitive environments. A general-purpose model can't carry that load reliably. It hallucinates tool calls. It responds slowly. It lacks the domain precision that security and operations contexts demand.

The solution isn't a bigger model. It's a smarter one — trained on the right data.

What Nova Synthetic Is Building

Nova Synthetic is partnering with Adamant to design and generate a domain-specific fine-tuning dataset. The goal: teach the model to think, reason, and act the way a security-first AI agent should.

Three priorities drive the dataset design:

Information Security Mastery. The dataset covers threat reasoning, incident response patterns, security policy interpretation, and risk classification — enabling Adamant to assist teams working in regulated or high-stakes environments without introducing uncertainty.
Precision Tool Calling. Adamant executes real actions — file operations, web research, desktop automation, document creation. Every tool call must be correct, well-formed, and contextually appropriate. The dataset is engineered to eliminate ambiguous or hallucinated calls and build reliable, step-by-step execution.
Low-Latency Inference. On-device performance demands efficiency. The training data is structured to favor concise, decision-ready outputs — reducing token overhead without sacrificing quality or safety.

Why Synthetic Data Is the Right Approach

Building fine-tuning datasets for security domains using real operational data isn't just difficult — it's often impossible. That data is sensitive, regulated, and rarely shareable.

Nova Synthetic's approach sidesteps that constraint entirely. We generate statistically rigorous, domain-accurate training examples from scratch — preserving the patterns and edge cases that matter, without touching data that can't be touched.

The result is a dataset that is:

Private by design. No real user data, no operational logs, no exposure risk.
Targeted and efficient. Every sample is crafted for a specific capability gap — not scraped from noise.
Controllable. We can adjust distribution, difficulty, domain density, and task complexity as the model evolves.

Current Status

This project is active and in development. The dataset architecture and domain taxonomy are defined. Corpus generation is underway across three primary tracks: security reasoning, tool execution, and concise multi-step planning.

Initial fine-tuning experiments are scheduled for Q2 2026, with iterative evaluation against Adamant's real-world task benchmarks. Results will be shared as the project advances.

For more information about Adamant, visit adamantcore.ai

Latest Case Studies

Featured Demonstrations

Fine-Tuning LLM Datasets for Security-First AI Agents

Cancer Research Dataset Demo

Financial Fraud Detection Demo

Diabetes Dataset Synthesis Demo

Nova Synthetic: The Future of Synthetic Data in Healthcare and Beyond

The Global Challenge

A Key Breakthrough: The Diabetes 130 Case

Compliance and Trust as Pillars

Benefits Beyond Healthcare

A Vision for Latin America and the World

Nova Synthetic: Synthetic Data for a Safer Financial Future

The Financial Fraud Challenge

A Breakthrough That Opens New Opportunities

Compliance and Trust

Benefits for the Financial Industry

A Path Toward Regional Prosperity

Nova Synthetic: High-Quality Synthetic Data for Oncological Research

The Oncological Research Challenge

A Breakthrough That Opens Pathways

Why This Matters for Cancer Research

Technical Excellence in Implementation

Innovation with Human Impact

A Shared Future

Fine-Tuning LLM Datasets for Security-First AI Agents

The Problem with Generic Models

What Nova Synthetic Is Building

Why Synthetic Data Is the Right Approach

Current Status