Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parsing a LOT of JSON

CharleyMcGee
6 - Meteoroid

Ok, so I'm pretty new to Alteryx ... blah blah n00b, etc.  Anyway, I'm working on a dataset that is only available to me in JSON, but I need to get it into Postgres.  For that, I need Alteryx to pick up a file that is about half a million lines long, contains ~180 or so records each of which contains ~900 key:value pairs.  I need to drop the bits I don't need, then stuff what's left into Postgres.  I know I haven't described the data terribly well, but you can view it here:  http://static.mwomercs.com/api/mechs/list/full.json

 

It looks like the JSON parser sort of expects to find a single record per file.  How can I work around this?

 

Thanks!

6 REPLIES 6
RodL
Alteryx Alumni (Retired)

Use a Summarize tool, where you Group By the field that identifies the "record", and then Concatenate the actual data with a "space" as the Separator.

This will put it in a format of a single record for each record in your data. Then you connect the JSON Parse tool to that and it should get you the parsed data.

JohnJPS
15 - Aurora

I found it pretty difficult to work with also, even the R tool ("fromJSON" from the rjson library) didn't do very well.  I was able to convert it to a csv using C# but am not sure if the result will be useful to you. This is based on http://stackoverflow.com/questions/32961051/convert-nested-json-to-csv.  The resulting .csv is attached (hopefully).

 

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Newtonsoft.Json.Linq;

namespace processJson
{
    class Program
    {
        static void Main(string[] args)
        {

            System.IO.StreamReader sr = new System.IO.StreamReader("C:/testDir/full.json");
            System.Text.StringBuilder sb = new StringBuilder("");
            while (!sr.EndOfStream ) {
                sb.Append(sr.ReadLine());
            }

            var obj = JObject.Parse(sb.ToString ());

            // Collect column titles: all property names whose values are of type JValue, distinct, in order of encountering them.
            var values = obj.DescendantsAndSelf()
                .OfType<JProperty>()
                .Where(p => p.Value is JValue)
                .GroupBy(p => p.Name)
                .ToList();

            var columns = values.Select(g => g.Key).ToArray();

            // Filter JObjects that have child objects that have values.
            var parentsWithChildren = values.SelectMany(g => g).SelectMany(v => v.AncestorsAndSelf().OfType<JObject>().Skip(1)).ToHashSet();

            // Collect all data rows: for every object, go through the column titles and get the value of that property in the closest ancestor or self that has a value of that name.
            var rows = obj
                .DescendantsAndSelf()
                .OfType<JObject>()
                .Where(o => o.PropertyValues().OfType<JValue>().Any())
                .Where(o => o == obj || !parentsWithChildren.Contains(o)) // Show a row for the root object + objects that have no children.
                .Select(o => columns.Select(c => o.AncestorsAndSelf()
                    .OfType<JObject>()
                    .Select(parent => parent[c])
                    .Where(v => v is JValue)
                    .Select(v => (string)v)
                    .FirstOrDefault())
                    .Reverse() // Trim trailing nulls
                    .SkipWhile(s => s == null)
                    .Reverse());

            // Convert to CSV
            var csvRows = new[] { columns }.Concat(rows).Select(r => string.Join(",", r));
            var csv = string.Join("\n", csvRows);

            System.IO.StreamWriter sw = new System.IO.StreamWriter("C:/testDir/full.csv");
            sw.WriteLine(csv);
            sw.Flush();
            sw.Close();
            sw.Dispose();

            //Console.WriteLine(csv);
        }
    }

    public static class EnumerableExtensions
    {
        // http://stackoverflow.com/questions/3471899/how-to-convert-linq-results-to-hashset-or-hashedset
        public static HashSet<T> ToHashSet<T>(this IEnumerable<T> source)
        {
            return new HashSet<T>(source);
        }
    }
}

 

 

CharleyMcGee
6 - Meteoroid

You, Sir or Ma'am, are my personal short term religious icon!

 

Not gonna lie, I'm feeling pretty vindicated about this now, noob or not.

 

MUCH appreciated.

RodL
Alteryx Alumni (Retired)

Appreciate the coding that @JohnJPS comes up with. It shows how "expandable" Alteryx can be to those with that skill set.

 

Since most of us are not in that category of being able to code like he does, I've attached a workflow (that contains the data set pointed at by @CharleyMcGee in his original post) that just uses basic Alteryx tools to get to a point where you have parsed out JSON data. This way hopefully people are "scared away" from Alteryx thinking they need to have that level of expertise. Smiley Wink

 

It appears that the entire set of data is just one record. If there were multiple records, my assumption would be that there would be some indication of that in the code by which you could Group By in the Summarize tool (or you could create a batch macro that looks at multiple files individually that does what is attached). But in this case, the Summarize tool merely concatenates all of the data into a single VERY long string...which the JSON Parse tool can understand.

 

Just wanting to provide an less code-intensive alternative that a "noob" might feel more comfortable with. Smiley Happy

JohnJPS
15 - Aurora

Thanks for bringing it back to reality @RodL; just FYI the code was pretty much copied verbatim from the given StackOverflow link; so no expertise required; though having a C# compiler helps.  :)

 

Anyway, I'd be interested in seeing a workflow that will automatically do all the transposing and cross-tabbing necessary to move from your output to a single table view (e.g. if the JSON were interpreted bunch of tables, then what you'd get by selecting everything from all the tables joined together). That's what I was having difficulty with, thus resorting to StackOverflow and C#.

 

CharleyMcGee
6 - Meteoroid

RodL's example is a great starting point.  It is CERTAINLY better than what I was getting.  however, in this case, given this very specific data set, the C# code actually gets me where I need to be more readily.  I can pretty much take that output, drop what I don't need and then push to Postgres.

 

I still want to work out a means of doing this completely within Alteryx, if possible, because that data set will change periodically and it would just be simpler to have Alteryx reach out, grab the new JSON, parse it out, dump the unneeded fields and then push the fresh data up to Postgres.  Doing all of that in one place means I can just kick off a single job (whether it be some kind of C# program or powershell script or python or an Alteryx workflow) and be done with it.  I'd like for that to happen within Alteryx.

 

PROGRESS HAS BEEN MADE!

Labels