An important part of EMBench++ is the collection of data. We do not want the synthetic data that we are generating to be completely random strings but we want them to be real world values. For this reason we have introduce the so called shredders. A shredder is a software component that takes a database (relational or XML) and shreds it into a series of column tables. There are general purpose shredders for relational or XML databases, but there are also shredders specifically designed for many popular database that are freely available, such as Wikipedia, IMdb, DBLP, and OKKAM.

The user of the benchmark has the ability to select what databases to be shredded, or to add additional databases if desired, by supplying alongside the respective shredder, or using the general purpose that comes with the system. Due to security restrictions, this functionality is not available over the EMBench Web System. Thus, the data generation can be performed using the source repository available in the EMBench++ implementation. A summary of the data in this default repository is provided in the following table.

       No.      Name      Record Number      Random Values      
      1.   movietitle    458,143   
      2.   lastname    323,021   
      3.   firstname    128,958   
      4.   song    110,361   
      5.   company    107,364   
      6.   movieoccupation    83,303   
      7.   occupation    83,303   
      8.   movieproducers    70,877   
      9.   film    52,979   
      10.   distributor    22,184   
      11.   editor    13,792   
      12.   album    12,007   
      13.   university    11,804   
      14.   mountain    9,811   
      15.   software    7,768   
      16.   booktitle    5,208   
      17.   organization    4,414   
      18.   studio    4,179   
      19.   disease    4,003   
      20.   newspaper    3,338   
      21.   subsidiarycompany    3,126   
      22.   band    2,888   
      23.   museum    2,456   
      24.   theatre    629   
      25.   publisher    482   
      26.   athleticconference    229   
      27.   monastery    189   
      28.   series    153   
      29.   symptom    135   
      30.   toy    81   
      31.   school    47   
      32.   movierelatedoccupation    15   



 
Last modified: May 2018,   Page created by: Antonis Papadakis,   Page maintained by: Ekaterini Ioannou, Yannis Velegrakis