This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Many legacy applications that use Mainframes have certain data encoded in EBCDIC in DB2 tables(variable length columns that can have 0-100s of iterations in a single EBCDIC encoded compressed data). When this data is downloaded to a platform like Hadoop that does'nt understand EBCDIC data , It appears as junk characters .
I solved this issue in my project using an approach designed/implemented in PySpark script[ separate logic needed for COMP , COMP3, X(alphanumeric) datatypes] .Having this functionality in a Tool can help many Organizations that use data from Mainframes applications.