An important part of EMBench is the collection of data. We do not want the synthetic data that we are generating to be completely random strings but we want them to be real world values. For this reason we have introduce the so called shredders. A shredder is a software component that takes a database (relational or XML) and shreds it into a series of column tables. There are general purpose shredders for relational or XML databases, but there are also shredders specifically designed for many popular database that are freely available, such as Wikipedia, IMdb, DBLP, and OKKAM.

The user of the benchmark has the ability to select what databases to be shredded, or to add additional databases if desired, by supplying alongside the respective shredder, or using the general purpose that comes with the system. Due to security restrictions, this functionality is not available over the EMBench Web System. Thus, the data generation can be performed using the source repository available in the default EMBench implementation. A summary of the data in this repository is provided in the following table.

       No.      Name      Record Number      Random Values      
      1.   FullName    on-the-fly value creation    
      2.   FullName_derived    on-the-fly value creation    
      3.   Title    1,309,203   
      4.   Author    775,653   
      5.   MovieTitle    458,143   
      6.   LastName    323,021   
      7.   FirstName    131,176   
      8.   Song    110,361   
      9.   MovieOccupation    83,306   
      10.   MasculinFirstName    79,871   
      11.   MovieProducers    70,877   
      12.   table_PersonPP_A2    61,508   
      13.   PersonPP    57,000   
      14.   FeminineFirstName    56,550   
      15.   Film    52,979   
      16.   table_PersonPP_A1    30,000   
      17.   Book    22,542   
      18.   Distributor    22,184   
      19.   AnyInfo    13,860   
      20.   Editor    13,792   
      21.   Album    12,007   
      22.   University    11,804   
      23.   table_ArticlePP_A1    10,000   
      24.   Mountain    9,811   
      25.   Company    8,613   
      26.   Software    7,768   
      27.   Booktitle    5,208   
      28.   Organization    4,414   
      29.   Studio    4,209   
      30.   Disease    4,003   
      31.   Newspaper    3,338   
      32.   Manufacturer    3,126   
      33.   Band    2,888   
      34.   Museum    2,456   
      35.   t1    1,539   
      36.   t2    1,539   
      37.   Journal    875   
      38.   Protein    761   
      39.   Theatre    629   
      40.   Publisher    482   
      41.   Food    442   
      42.   Instrument    402   
      43.   AthleticConference    229   
      44.   Monastery    189   
      45.   Series    153   
      46.   Symptom    135   
      47.   Toy    81   
      48.   School    47   



 
Last modified: July 2014,   Page maintained by: Ekaterini Ioannou, Yannis Velegrakis