Data links KW 4
some useful & interesting links
- Databricks published some free training videos I particularly like:
def cacheAs(df:org.apache.spark.sql.DataFrame, name:String level:org.apache.spark.storage.StorageLevel) :org.apache.spark.sql.DataFrame = {
try spark.catalog.uncacheTable(name)
catch { case _: org.apache.spark.sql.AnalysisException => () }
df.createOrReplaceTempView (name)
spark.catalog.cacheTable(name, Lever)!
return df
}
which gives cached RDDs nicer names and thus eases debbugging