An important part of EMBench is the collection of data. We do not want the synthetic data that we are generating to be completely random strings but we want them to be real world values. For this reason we have introduce the so called shredders. A shredder is a software component that takes a database (relational or XML) and shreds it into a series of column tables. There are general purpose shredders for relational or XML databases, but there are also shredders specifically designed for many popular database that are freely available, such as Wikipedia, IMdb, DBLP, and OKKAM.

The user of the benchmark has the ability to select what databases to be shredded, or to add additional databases if desired, by supplying alongside the respective shredder, or using the general purpose that comes with the system. Due to security restrictions, this functionality is not available over the EMBench Web System. Thus, the data generation can be performed using the source repository available in the default EMBench implementation. A summary of the data in this repository is provided in the following table.

       No.      Name      Record Number      Random Values      
      1.   FullName    on-the-fly value creation    
      2.   FirstName_derived    on-the-fly value creation    
      3.   FullName_derived    on-the-fly value creation    
      4.   PublicationTitle_derived    on-the-fly value creation    
      5.   t1    on-the-fly value creation    
      6.   PublicationTitle    1,309,203   
      7.   Title    1,309,203   
      8.   AnyInfo    548,290   
      9.   MTitle    458,143   
      10.   LastName    323,021   
      11.   FirstName    135,636   
      12.   Song    110,361   
      13.   Company    84,847   
      14.   MOccupation    83,306   
      15.   Occupation    83,306   
      16.   FilmStudio    74,509   
      17.   MasculinFirstName    74,079   
      18.   MProducers    70,877   
      19.   Film    52,979   
      20.   FeminineFirstName    49,299   
      21.   Landmark    24,901   
      22.   FilmManufacturer    24,713   
      23.   Distributor    22,184   
      24.   Album    12,007   
      25.   University    11,817   
      26.   Mountain    9,811   
      27.   Software    7,768   
      28.   FeminineBabyName    7,251   
      29.   BookTitle    6,211   
      30.   MasculinBabyName    5,792   
      31.   Organization    4,414   
      32.   Studio    4,209   
      33.   Disease    4,003   
      34.   Newspaper    3,338   
      35.   Manufacturer    3,126   
      36.   Band    2,888   
      37.   Museum    2,456   
      38.   Journal    875   
      39.   Protein    761   
      40.   Theatre    629   
      41.   Publisher    482   
      42.   Food    442   
      43.   Instrument    402   
      44.   AthleticConference    229   
      45.   Monastery    189   
      46.   Series    153   
      47.   Symptom    135   
      48.   Toy    81   



 
Last modified: July 2014,   Page maintained by: Ekaterini Ioannou, Yannis Velegrakis