News News

For Researchers

Overview of BBJ’s Samples and Data

Advancing Genomic Research Through the Provision of Samples and Data to Researchers


BioBank Japan (BBJ) collected DNA samples and clinical information from approximately 270,000 patients through cooperative medical institutions nationwide during fiscal years 2003-2007 and 2012-2017. Approximately 200,000 patients during fiscal years 2003-2007 also donated BBJ with serum samples. These samples and data are stored under the strictest security measures, with names and other personal information removed, and with research IDs attached. Under proper procedures and in accordance with Japanese rules (laws, guidelines, etc.), we provide those samples and data to academic research institutions, private companies, and other researchers who aim to realize genome medicine and develop new clinical and therapeutic methods. 

The samples provided by BBJ originate from patients diagnosed with 51 common diseases, including lifestyle-related diseases, as of December 2023. These samples, coupled with clinical information about the diseases, constitute a highly valuable clinical and biological data for genome research targeting the Japanese population. In recent years, BBJ has intensified efforts to augment its disease-oriented biobank infrastructure by collecting additional clinical data from patients. Simultaneously, the genomic and omics analyses are being conducted using these sample and data to meet the evolving needs of researchers and fortify the biobank’s database. BBJ is committed to further enhancing its database and addressing researchers’ requirements through ongoing genomic and omics analyses. We encourage researchers to leverage BBJ’s sample and data for their studies.

Target Diseases

  • 51 diseases including lifestyle-related diseases, diabetes, cancer, etc.
  • 267,000 patients, 440,000 cases
  • Average follow-up period more than 10 years

Number of Diseases
51 Diseases : 267,306 patients、441,550 cases  (as of December 2023)

Diabetes mellitus59,562
Cerebral infarction21,404
Stable angina20,723
Myocardial infarction16,637
Colorectal cancer14,886
Heart failure12,890
Prostate cancer11,755
Gastric cancer11,393
Breast cancer11,380
Bronchial asthma11,201
Chronic hepatitis C8,810
Lung cancer7,866
Rheumatoid arthritis6,743
Uterine fibroid6,216
Unstable angina6,123
Liver cirrhosis4,800
Liver cancer4,267
Cerebral aneurysm3,941
Atopic dermatitis3,402
Hematopoietic malignancy2,671
Chronic hepatitis B2,666
Graves’ disease2,493
Esophageal cancer2,427
Endometrial cancer2,075
Cervical cancer2,025
Pulmonary fibrosis1,917
Ovarian cancer1,610
Nephrotic syndrome1,179
Pancreatic cancer1,091
Pulmonary tuberculosis1,011
Drug eruption972
Gallbladder/Bile duct cancer952
Renal cancer887
Depressive disorder541
Cerebral hemorrhage440
Febrile seizure341
1st cohort

The study draws its foundation from DNA, serum and clinical information originating from a cohort comprising approximately 200,000 individuals over the period spanning from June 2003 to March 2008. (47 diseases)
Serum and clinical information have been systematically collected on cases that continued to come to the hospital since April 2008. In specific cases, the determination of causes of mortality has been carried out through the examination of death certificate records, as a result of prognostic investigations.

2nd cohort

The study draws its foundation from DNA and clinical information originating from a cohort comprising approximately 70,000 individuals over the period spanning from December 2012 to December 2017. (38 diseases: 34 diseases common to 1st cohort + 4 new diseases)


  • 267,000 patients, 800,000 tubes(Collected in 1st cohort and 2nd cohort)
  • DNA is extracted from blood
  • More than 200% utilized and analyzed
  • Among the first in the world to make DNA informative
  • Stored at the fully automated bank system
  • Distribution unit: 5 µg (50 µL) per sample
  • Clinical information required for analysis will be provided
Quality (as of December 18, 2017)

DNA quality (concentrations adjusted to 100 ng/µL when purified from blood)

  • Electrophoresis (agarose gel): Good 87.2%, Fair 10.7%, Poor 2.1%
  • DNA concentration (PicoGreen): <10ng/µL 0.27%, <50ng/µL 1.7%

Results of comprehensive genomic analysis using Illumina SNP arrays

  • Genotyping has been performed on a total of 218,116 patients (>500,000 SNPs).
    • Call rate ≧99%: 215,276 patients (98.70%)
  • Of the 1st cohort of approximately 200,000 patients, genotyping was performed at least once on 189,451 patients
    • Call rate ≧99%: 187,165 patients (98.79%).

Whole genome sequencing analysis is possible even with genomic DNA stored for more than 15 years


  • 200,000 patients, 1.70 million tubes(Collected in the 1st cohort)
  • Collection over time for up to 10 years/li>
  • Treasure trove of diagnostics research: proteome, metabolome, exosome, etc.
  • Serum panels are also available (small volume (100 µL units) at low cost)
  • Distribution unit:
    • Serum 1 tube (350 µL to 1000 µL)
    • Serum panel: 100 µL per sample (minimum distribution unit: 20 samples)
  • Individual clinical information necessary for analysis will be provided
Serum panels are available

Small panels of serum samples (in units of 100µL) are provided as a control or for screening purposes so that more researchers can utilize our serum samples.

Serum Sample Panel (distributed as of April 2019)
Lung cancer (small cell carcinoma)Cerebral infarction (atherothrombotic)Nephrotic syndrome
Lung cancer (adenocarcinoma)Cerebral infarction (cardiogenic embolism)Urolithiasis
Lung cancer (large cell carcinoma)Cerebral infarction (lacunar)Osteoporosis
Esophageal cancerCerebral aneurysmDiabetic mellitus
Gastric cancerEpilepsyDyslipidemia
Colorectal cancerBronchial asthmaGrave’s disease
Liver cancer (w/chronic hepatitis B)Pulmonary tuberculosisRheumatoid arthritis
Liver cancer (w/chronic hepatitis C)COPDPollinosis
Pancreatic cancerPulmonary fibrosis/Interstitial pneumonitisDrug eruption
Gallbladder/Bile duct cancerMyocardial infarctionAtopic dermatitis
Prostate cancerUnstable anginaKeloid
Breast cancerStable anginaUterine fibroid
Cervical cancerArrhythmiaEndometriosis
Uterine cancerHeart failureFebrile seizure
Ovarian cancerArteriosclerosis obliterans (ASO)Glaucoma
Multiple myelomaChronic hepatitis BCataract
Hodgkin’s lymphomaChronic hepatitis CPeriodontitis
Non-Hodgkin’s lymphomaLiver cirrhosis (w/chronic hepatitis B)
Liver cirrhosis (w/chronic hepatitis C)
Cancer cases without other cancer complications and serum non-cancer cases with cancer-carrying status are not registered except for the relevant disease.

For 11 types of cancers, pre- and post-onset serum panels (pre- and post-onset pairs) can be provided.

Serum Panels (pre- and post- onset pairs)
Name of disease
Lung cancerColorectal cancerGallbladder/Bile duct cancerCervical cancer
Esophageal cancerLiver cancerProstate cancerOvarian cancer
Gastric cancerPancreatic cancerBreast cancer

Clinical information provided for the serum panel (55 diseases)

  • Basic information (disease name, age, sex)
  • Age: Age at the date of collection (date of blood sampling)
  • Cancer (cancer baring with no other cancer): + Diagnosis time, preoperative, recurrence, histological type, and stage
  • Non-cancer (no other diseases): disease name only

※There are missing values. Please contact the Sample and Data Inquiry Section of the Office of the BBJ for assistance in making use of this information.

Genomic and Omics Data

  • SNPs (single nucleotide polymorphisms): 900,000 loci × 182,000 patients
    700,000 loci x 54,000 patients (1st and 2nd cohorts)
  • Whole genome sequencing: Multiple diseases: 6,200 patients
    (1st and 2nd cohorts)
  • Breast cancer-related genes sequencing: 30,000 patients
    (1st and 2nd cohorts)
  • Metabolome (biomarkers) 46,000 samples
    (1st cohort)
  • Proteome 3,000 samples(1st cohort)

BBJ began its fifth phase of operations in a fiscal year 2023. In recent years, genomic and omics analysis methods have been rapidly developed with advances in next-generation sequencing and other analytical technologies. As a result, Genomic and Omics analysis data are widely used for genome research including biomarker discovery research and other researches aimed at developing new clinical and therapeutic methods. Against this backdrop, BBJ is promoting genomic and omics analysis of biological samples collected from the patients for further enhancing its data base for genomic medical research with the aim of implementing genomic medicine.

Genomic/Omics data are available on both BBJ’s and the public databases
  1. BBJ database
  2. Public databases (NBDC, AGD etc.)

Genomic and omics data are available on BBJ database and also on public databases such as the NBDC human database by DBCLS as well as the Genome Group Sharing Database (AGD) in order to promote the utilization of data.

Available Genomic and Omics Data

Application Process for Data and Corresponding Clinical Information

For data being available at BBJ database, please apply for use to BBJ. For data available at public databases, please apply for use to the respective database.
For clinical information corresponding to the data you have applied to use from a public database, please refer to the Application Process for Obtaining Samples and Data from BBJ. The use of the information will become possible after application is received and approved by the BBJ Sample and Data Access Committee. For detail of the clinical information, please refer to the Details on the clinical information (basic information and disease sheet)(Japanese Only).

Application Process for Obtaining Samples and data from BBJ

Clinical Information

  • Created from medical information
  • 2,500 common items
  • Disease-specific items: more than 3,300
  • Average follow-up period: more than 10 years

Using the details on the items of the clinical information (basic information and disease sheet) held by BBJ, you can refine your search for samples.
We will provide you with the clinical information in an Excel spreadsheet approximately one month after the sample is provided.

Details on the Clinical Information (basic information and disease sheet)

Shared items for all diseases : Basic items to be entered for all diseases

Basic informationInformation at registration, smoking, eating and drinkingAt the time of the initial survey
Common classificationDiet, exercise, medical history, family history

Shared items for all diseases : Items to be entered for all diseases

Current PrescriptionInformation on medications administered within one month of the survey datePeriod of the survey,
multiple registrations
Side effectSide effect

BBJ Online Biological Sample Search System (Japanese only)

Researchers can search the number of samples held by BBJ using the following criteria: sample type, gender, age at registration, name of registered disease, medical history, smoking history, alcohol consumption history, and availability of GWAS data. Please use the search results as a “sample count guideline” when applying to BBJ for obtaining samples. Please note that the search results are current as of the time of regular data updates and may differ from the current holding status.

BBJ's Online Biological Samples Search System
BBJ's Online Biological Samples Search System

Quality Assurance of BBJ Samples and Data

Effective October 30, 2017, the Office of BioBank Japan acquired certifications for ISO 9001: 2015 (quality management systems) and ISO/IEC 27001: 2013 (information security management systems), two international standards published by the International Standardization Organization.
This is one of our efforts to improve quality and information security in the collection, storage, and provision of biological samples and data.

For more information on our samples storage facilities, please refer to BioSample Storage Facilities page.

left:Certificate for information security management systems right:Certificate for quality management systems

Usage Fees

Review process and guidelines for samples usages

Application Process for Obtaining Samples and Data from BBJ