Here is the article:
- https://databricks.com/blog/2017/05/31/top-5-reasons-for-choosing-s3-over-hdfs.html
I am not experienced enough with HDFS to understand fully the comparison being made, but it seems they are choosing a particular EC2 instance price assumption as the comparison point.
Does that actually make sense as a counterfactual you could compare with an in-house data center operating HDFS?