An important part of EMBench is the collection of data. We do not want the synthetic data that we are generating to be completely random strings but we want them to be real world values. For this reason we have introduce the so called shredders. A shredder is a software component that takes a database (relational or XML) and shreds it into a series of column tables. There are general purpose shredders for relational or XML databases, but there are also shredders specifically designed for many popular database that are freely available, such as Wikipedia, IMdb, DBLP, and OKKAM.

The user of the benchmark has the ability to select what databases to be shredded, or to add additional databases if desired, by supplying alongside the respective shredder, or using the general purpose that comes with the system. Due to security restrictions, this functionality is not available over the EMBench Web System. Thus, the data generation can be performed using the source repository available in the default EMBench implementation. A summary of the data in this repository is provided in the following table.

       No.      Name      Record Number      Random Values      
      1.   album_derived    on-the-fly value creation    
      2.   anyinfo    835,272   
      3.   author    775,653   
      4.   album    12,007   
      5.   athleticconference    229   

Last modified: July 2014,   Page maintained by: Ekaterini Ioannou, Yannis Velegrakis