Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<ul> <li>Publish the original raw data online and make it freely available for download.</li> <li>Make the code base open source and available online for download.</li> <li>If randomization is used in optimization, then repeat the optimization several times, choosing the best value that results or use a fixed random seed, so that the same results are repeated.</li> <li>Before performing your analysis, you should split the data into a "training/analysis" dataset and a "testing/validation" dataset. Perform your analsysis on the "training" dataset, and make sure that the results that you get still hold on the "validation" dataset to ensure that your analysis is actually generalizable and isn't simply memorizing peculiarities of the dataset in question. </ul> <p>The first two points are incredibly important, because making the dataset available allows others to perform their own analyses on the same data, which increases the level of confidence in the validity of your own analyses. Additionally, making the dataset available online -- especially if you use linked data formats -- makes it possible for crawlers to aggregate your dataset with other datasets, thereby enabling analyses with larger data sets... in many types of research, the sample size is sometimes too small to be really confident about the results... but sharing your dataset makes it possible to construct very large datasets. Or, someone could use your dataset to validate the analysis that they performed on some other dataset.</p> <p>Additionally, making your code open source makes it possible for the code and procedure to be reviewed by your peers. Often such reviews lead to the discovery of flaws or of the possibility for additional optimization and improvement. Most importantly, it allows other researchers to improve on your methods, without having to implement everything that you have already done from scratch. It very greatly accelerates the pace of research when researches can focus on just improvements and not on reinventing the wheel.</p> <p>As for randomization... many algorithms rely on randomization to achieve their results. Stochastic and Monte Carlo methods are quite common, and while they have been proven to converge for certain cases, it is still possible to get different results. The way to ensure that you get the same results, is to have a loop in your code that invokes the computation some fixed number of times, and to choose the best result. If you use enough repititions, you can expect to find global or near-global optima instead of getting stuck in local optima. Another possibility is to use a predetermined seed, although that is not, IMHO, as good an approach since you could pick a seed that causes you to get stuck in local optima. In addition, there is no guarantee that random number generators on different platforms will generate the same results for that seed value.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload