Questions on Spark SQL
Currently we are building a reporting platform as a data store we used Shark. Since the development of Shark is stopped so we are in the phase of evaluating Spark SQL. Based on the use cases we have we had few questions.
1) We have data from various sources( MySQL, Oracle, Cassandra, Mongo). We would like to know how can we get this data into Spark SQL? Does there exist any utility which we can use? Does this utility support continuous refresh of data (sync of new add/update/delete on data store to Spark SQL?
2) Is the a way to create multiple database in Spark SQL?
3) For Reporting UI we use Jasper, we would like to connect from Jasper to Spark SQL. When we did our initial search we got to know currently there is no support for consumer to connect Spark SQL through JDBC, but in future releases you would like the add the same. We would like to know by when Spark SQL would have a stable release which would have JDBC Support? Meanwhile we took the source code from https://github.com/amplab/shark/tree/sparkSql but we had some difficulty in setting it up locally and evaluating it . It would be great if you can help us with setup instructions.(I can share the issue we are facing please let me know where can I post the error logs)
4) We would also require a SQL prompt where we can execute queries, currently Spark Shell provides SCALA prompt where SCALA code can be executed, from SCALA code we can fire SQL queries. Like Shark we would like to have SQL prompt in Spark SQL. When we did our search we found that in future release of Spark this would be added. It would be great if you can tell us which release of Spark would address the same.
Supreeth Kumar, Jul 08, 2014
Well, you have a lot of moving parts in your system.
Spark only supports SQL Server. Furthermore, to benefit from the code-generating facility your data model must adhere to some conventions (primarily about the primary key: they must be int identity fields that are named Id).
On the other hand Spark does support the use of multiple databases, but, again, only with SQL Server and each database must adhere to the conventions.
A lot can be done to adjust Spark to your particular environment. However, I am not sure that would be the right decision. With your list of legacy and other tools you may be better of with Entity Framework, Nhibernate,or some other ORM.
Hope this helps.
Jack Poorte, Jul 08, 2014