An important part of EMBench is the collection of data. We do not want the synthetic data that we are generating to be completely random strings but we want them to be real world values. For this reason we have introduce the so called shredders. A shredder is a software component that takes a database (relational or XML) and shreds it into a series of column tables. There are general purpose shredders for relational or XML databases, but there are also shredders specifically designed for many popular database that are freely available, such as Wikipedia, IMdb, DBLP, and OKKAM.

The user of the benchmark has the ability to select what databases to be shredded, or to add additional databases if desired, by supplying alongside the respective shredder, or using the general purpose that comes with the system. Due to security restrictions, this functionality is not available over the EMBench Web System. Thus, the data generation can be performed using the source repository available in the default EMBench implementation. A summary of the data in this repository is provided in the following table.

       No.      Name      Record Number      Random Values      
      1.   fullname    on-the-fly value creation    
      2.   movietitle    458,143   
      3.   lastname    323,021   
      4.   firstname    128,958   
      5.   song    110,361   
      6.   company    107,364   
      7.   movieoccupation    83,303   
      8.   occupation    83,303   
      9.   movieproducers    70,877   
      10.   film    52,979   
      11.   distributor    22,184   
      12.   editor    13,792   
      13.   album    12,007   
      14.   university    11,804   
      15.   mountain    9,811   
      16.   software    7,768   
      17.   booktitle    5,208   
      18.   organization    4,414   
      19.   studio    4,179   
      20.   disease    4,003   
      21.   newspaper    3,338   
      22.   subsidiarycompany    3,126   
      23.   band    2,888   
      24.   museum    2,456   
      25.   theatre    629   
      26.   publisher    482   
      27.   athleticconference    229   
      28.   monastery    189   
      29.   series    153   
      30.   symptom    135   
      31.   toy    81   
      32.   school    47   
      33.   movierelatedoccupation    15   



 
Last modified: May 2018,   Page maintained by: Ekaterini Ioannou, Yannis Velegrakis