Many legacy applications that use Mainframes have certain data encoded in EBCDIC in DB2 tables(variable length columns that can have 0-100s of iterations in a single EBCDIC encoded compressed data). When this data is downloaded to a platform like Hadoop that does'nt understand EBCDIC data , It appears as junk characters .
I solved this issue in my project using an approach designed/implemented in PySpark script[ separate logic needed for COMP , COMP3, X(alphanumeric) datatypes] .Having this functionality in a Tool can help many Organizations that use data from Mainframes applications.