Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>You should take a look at <code>sqoop</code>, it has an integration with Cassandra as shown <a href="http://www.datastax.com/docs/datastax_enterprise2.0/sqoop/sqoop_help" rel="nofollow noreferrer">here</a>.</p> <p>This will also scale easily, you need a Hadoop cluster to get <code>sqoop</code> working, the way it works is basically:</p> <ul> <li>Slice your dataset into different partitions.</li> <li>Run a Map/Reduce job where each mapper will be responsible for transferring 1 slice.</li> </ul> <p>So the bigger the dataset you wish to export, the higher the number of mappers, which means that if you keep increasing your cluster the throughput will keep increasing. It's all a matter of what resources you have.</p> <p>As far as the load on the Cassandra cluster, I am not certain since I have not used the Cassandra connector with <code>sqoop</code> personally, but if you wish to extract data you will need to put some load on your cluster anyway. You could for example do it once a day at a certain time where the traffic is lowest, so that in case your Cassandra availability drops the impact is minimal.</p> <p>I'm also thinking that if this is related to <a href="https://stackoverflow.com/q/14532230/1332690">your other question</a>, you might want to consider exporting to Hive instead of MySQL, in which case <code>sqoop</code> works too because it can export to Hive directly. And once it's in Hive you can use the same cluster as used by <code>sqoop</code> to run your analytics jobs.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload