Credit: Databricks
KnowledgebaseBest PracticesAvoid GroupByKey
Don't copy all elements of a large RDD to the driver
Gracefully Dealing with Bad Input Data
General TroubleshootingJob aborted due to stage failure: Task not serializable:
Missing Dependencies in Jar Files
Error running start-all.sh - Connection refused
Network connectivity issues between Spark components
Performance & OptimizationHow Many Partitions Does An RDD Have?
Data Locality
Spark StreamingERROR OneForOneStrategy