Latest Case Studies

Explore our synthetic data demonstrations showcasing how Nova Synthetic successfully transformed existing datasets into privacy-preserving synthetic versions. These demos prove our technology's ability to maintain statistical integrity while protecting sensitive information.

Featured Demonstrations

See how we successfully applied our
synthetic data technology to real-world datasets

Healthcare Research August 2025

Nova Synthetic: The Future of Synthetic Data in Healthcare and Beyond

Transforming data access while preserving privacy across industries

The Global Challenge

In today's world, access to quality information is an invaluable resource for science, medicine, and technological innovation. However, there is a challenge that transcends borders: How can we make the most of data without putting people's privacy at risk? This tension is not exclusive to Costa Rica or Latin America; it is a global challenge.

At Nova Synthetic, we have embraced this challenge as our mission. Our work focuses on creating synthetic datasets that preserve the statistical richness of real data while eliminating any link to specific individuals. In this way, institutions of all types can innovate, research, and generate solutions without exposing sensitive information.

A Key Breakthrough: The Diabetes 130 Case

We recently developed a synthetic dataset based on the historic Diabetes 130 – US Hospitals (1999–2008), comprising over 100,000 clinical records. The objective was clear: preserve the statistical fidelity of the original data while ensuring minimal risk of re-identification.

The results exceeded international standards:

  • Global Accuracy: 97.4%
  • Pattern Conservation: Univariate, bivariate, and trivariate patterns preserved, ensuring consistency in complex analyses
  • Solid Privacy: 0% literal matches and a re-identification risk comparable to chance

In other words, we have data with the same utility as the originals for research and development, but without the risk of compromising any patient's confidentiality.

Compliance and Trust as Pillars

Trust in data is not achieved solely through technical metrics. That's why every project at Nova Synthetic is designed to align with international regulatory frameworks such as HIPAA, GDPR, and ISO/IEC 27559, in addition to complying with local regulations in Latin America.

This approach allows us to deliver auditable, secure datasets ready to be used by hospitals, universities, insurance companies, startups, and research laboratories, within a framework of governance and responsibility.

Benefits Beyond Healthcare

The impact of synthetic data is not limited to the medical sector. These datasets open new opportunities in:

  • Artificial Intelligence: Training predictive models without compromising real information
  • Education and Training: Creating safe learning environments with realistic data
  • Financial and Insurance Industry: Risk analysis and fraud detection under full compliance
  • International Collaboration: Sharing knowledge between countries without privacy barriers

A Vision for Latin America and the World

From Costa Rica, Nova Synthetic seeks to demonstrate that it is possible to innovate responsibly. Our work is not just a technical contribution, but a commitment to a future where data becomes an engine of economic, social, and scientific progress.

We firmly believe that Latin America can lead this transformation. With reliable synthetic datasets, it is possible to boost public health research, strengthen the competitiveness of our institutions, and open the door to global collaborations that previously seemed unattainable.

At Nova Synthetic, we are convinced that synthetic data is not an alternative for the future, but a present tool that is already changing the way we research, innovate, and build prosperity for our communities.

Adrián Valerio P.

Founder & CEO, Nova Synthetic

97.4%
Global Accuracy
0%
Literal Matches
100K+
Clinical Records
3
Compliance Standards
Financial Innovation August 2025

Nova Synthetic: Synthetic Data for a Safer Financial Future

Transforming fraud detection while protecting customer privacy in the financial sector

The Financial Fraud Challenge

Financial fraud is one of the greatest challenges of our time. Every year, thousands of people and organizations suffer losses from increasingly sophisticated fraudulent practices. However, researching and developing solutions against fraud presents an obvious difficulty: the real data containing these signals is usually highly sensitive and protected by strict regulations.

At Nova Synthetic, we believe this challenge should not hinder innovation. Our team has demonstrated that it is possible to create high-fidelity synthetic datasets that reproduce the statistical patterns of fraud, without including personal information from any customer. In this way, research and the financial industry can advance safely and responsibly.

A Breakthrough That Opens New Opportunities

Our most recent work focused on the international reference dataset Bank Account Fraud (BAF). The objective was clear: preserve the complexity and natural imbalance of bank fraud data, but under minimal reidentification risk.

The results speak for themselves:

  • Global Accuracy: 98.4%
  • Univariate, bivariate and trivariate fidelity: All above acceptance standards
  • Enhanced Privacy: Reidentification risk so low it approaches chance

In simple terms, this is ideal data for training and validating fraud detection models, with the peace of mind that no real customer is at stake.

Compliance and Trust

Security is not measured only in numbers. Every Nova Synthetic project is designed to comply with the most demanding regulations, including:

  • Law 8968 and regulations of Costa Rica, under the supervision of PRODHAB
  • CONASSIF Agreement 5-24 (2024), which regulates technology management and risks in the Costa Rican financial system
  • International frameworks such as GDPR (Recital 26), which recognize that anonymous data does not constitute personal information

This makes Nova Synthetic a reliable ally for banks, fintechs, insurance companies, and regulatory entities, both in Costa Rica and throughout Latin America.

Benefits for the Financial Industry

Synthetic data applied to fraud not only solves a technical problem but generates strategic advantages for the entire sector:

  • Safer Testing: Allows simulation of fraud scenarios in controlled environments without risk of information leakage
  • More Robust Models: Facilitates training of algorithms that learn to detect complex patterns, even in situations of extreme imbalance
  • Guaranteed Compliance: Ensures that innovation is carried out in line with local and international regulatory requirements
  • Greater Public Trust: Strengthens the perception of security and responsibility in financial institutions

A Path Toward Regional Prosperity

From Costa Rica, Nova Synthetic works so that Latin America positions itself as a reference in responsible innovation with data. The development of quality synthetic datasets not only drives the fight against fraud but opens the door to a more solid, secure, and competitive financial ecosystem.

We believe that the region's future depends on finding the balance between technology, ethics, and trust. Synthetic data is a key piece to achieve that balance and ensure that prosperity is shared.

At Nova Synthetic, we are convinced that synthetic data represents not just a technological solution, but a commitment to a future where financial innovation and customer protection go hand in hand, building a more secure and prosperous financial landscape for all.

Adrián Valerio P.

Founder & CEO, Nova Synthetic

98.4%
Global Accuracy
0%
Reidentification Risk
3
Regulatory Frameworks
100%
Privacy Compliance
Oncology Innovation August 2025

Nova Synthetic: High-Quality Synthetic Data for Oncological Research

Advancing cancer research while protecting patient privacy through revolutionary synthetic data technology

The Oncological Research Challenge

Breast cancer is one of the greatest public health challenges in the world. Medical research requires high-quality data to discover patterns, test hypotheses, and develop more effective treatments. However, this data is usually extremely sensitive and protected by regulations that limit its access.

At Nova Synthetic, we believe that innovation and privacy should not be in conflict. That's why we have taken an important step: generating a high-quality synthetic dataset based on the renowned Rotterdam oncological dataset.

A Breakthrough That Opens Pathways

The result of this project is not just a set of numbers: it is proof that we can create realistic and reliable clinical datasets that preserve the statistical richness of the original data without exposing the identity of any patient.

Outstanding results:

  • Overall accuracy of 93.6%, ensuring that medical patterns are preserved in a robust way
  • Enhanced privacy: Reidentification risk reduced by more than 13% compared to previous versions
  • Scalability: We managed to triple the original dataset size, facilitating broader and more consistent analyses

In simple terms, this means that researchers and healthcare professionals can work with synthetic data that behaves like real data, but without compromising patient confidentiality.

Why This Matters for Cancer Research

Synthetic datasets like this enable researchers to:

  • Develop predictive models to anticipate recurrences and survival outcomes
  • Train AI algorithms for diagnosis and medical decision support
  • Share information between research teams without compromising privacy
  • Accelerate clinical discoveries by eliminating legal and ethical barriers that limit access to real data

In a field like oncology, where time can make the difference between life and death, this ability to generate and share reliable information opens new possibilities for global collaboration.

Technical Excellence in Implementation

Our advanced synthetic data generation pipeline utilized the SynthD system with MOSTLY AI SDK 4.7.8, featuring:

  • Dataset Processing: 2,982 samples with 16 medical variables from the Rotterdam breast cancer dataset
  • Advanced Training: 5 independent generators with 50 epochs each, totaling 33 minutes of optimized execution
  • Privacy Protection: Discriminator AUC of 0.5817, exceeding medical privacy standards
  • Quality Metrics: 98.19% univariate precision, 94.80% bivariate precision, 87.86% trivariate precision
  • Scalable Generation: Initial pool of 8,945 synthetic samples for optimal final selection

Innovation with Human Impact

Beyond the metrics, this achievement reflects a vision: data science in service of life. With each advance in synthetic data, we move closer to a future where:

  • Hospitals and universities in Latin America have access to international-level datasets
  • Researchers can validate hypotheses more quickly and safely
  • The region positions itself as a leader in responsible innovation, contributing to global scientific progress

A Shared Future

At Nova Synthetic, we know that the fight against cancer depends not only on medicine but also on the ability to generate knowledge from data. With this project, we demonstrate that it is possible to combine cutting-edge technology, ethical responsibility, and strategic vision to transform medical research.

This is one more step in our commitment: to make innovation protect and enhance human life, in Costa Rica, Latin America, and the world.

Adrián Valerio P.

Founder & CEO, Nova Synthetic

93.61%
Overall Accuracy
13%
Privacy Improvement
3x
Dataset Scaling
2,982
Synthetic Records