Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <blockquote> <p>1. Large sample web server logs that have been anonymized.</p> </blockquote> <p>These work to start with:</p> <ul> <li><a href="http://archive.ics.uci.edu/ml/datasets.html" rel="noreferrer">UCI Machine Learning Repository</a> <ul> <li><a href="http://archive.ics.uci.edu/ml/datasets/Anonymous+Microsoft+Web+Data" rel="noreferrer">Anonymous Microsoft Web Data</a></li> <li><a href="http://archive.ics.uci.edu/ml/datasets/MSNBC.com+Anonymous+Web+Data" rel="noreferrer">MSNBC.com Anonymous Web Data</a></li> <li><a href="http://archive.ics.uci.edu/ml/datasets/Syskill+and+Webert+Web+Page+Ratings" rel="noreferrer">Syskill and Webert Web Page Ratings</a></li> </ul></li> </ul> <p>There are many, many more data sets available than these (see the gamut of other answers), but this is the lowest hanging fruit that meets your original criteria. As a bonus, they have <a href="http://archive.ics.uci.edu/ml/contact.html" rel="noreferrer">a contact link</a> if you have specific needs they may know of.</p> <blockquote> <p>2. Datasets used for database performance benchmarking.</p> </blockquote> <p>This sounds like a misnomer, because you're asking for empirical data sets that describe <a href="http://en.wikipedia.org/wiki/B+_tree" rel="noreferrer">well-defined</a> <a href="http://en.wikipedia.org/wiki/Divide_and_conquer_algorithm" rel="noreferrer">algorithmic</a> <a href="http://en.wikipedia.org/wiki/Database_normalization" rel="noreferrer">problems</a>. Specifically, it sounds like you're trying to find sets of data that you can use to test and benchmark various database systems in real time, using well-defined, normalized relational data that can be used as a set of test cases for determining the most efficient solution that meets your needs.</p> <p>I don't agree with this approach. Instead of finding a litany of database systems and their canned implementations, it's far better to explore the <a href="http://en.wikipedia.org/wiki/B+_tree" rel="noreferrer">algorithmic</a> <a href="http://en.wikipedia.org/wiki/Graph_database" rel="noreferrer">guarantees</a> of these systems as your first port of call. Once you've determined the algorithmic constraints that meet your needs, you can hone in on a set of canned solutions that you can benchmark on efficiency of, for example, indexing, sorting, searching, insertion, deletion, and retrieval.</p> <p>Wikipedia provides <a href="http://en.wikipedia.org/wiki/Database_testing" rel="noreferrer">a terse article on database testing concepts</a> that you can use to determine and write test cases for benchmarking performance. For example, you might use an agnostic data access interface like <a href="http://en.wikipedia.org/wiki/Java_Database_Connectivity" rel="noreferrer">JDBC</a> and <a href="http://www.inetsoftware.de/products/jdbc-driver/oracle/documentation/benchmark" rel="noreferrer">JDBC Benchmark</a> to determine the relative timings of each operation. From here, you can hone in on a correct solution.</p> <p><strong>In short,</strong> go to <a href="http://en.wikipedia.org/wiki/Analysis_of_algorithms" rel="noreferrer">the research</a> first for determining database guarantees. Once a set of candidate solutions has been identified, you can select amongst those by testing (or otherwise determining) the constant time performance of each desired operation.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload