Challenge #36: Data Cleansing Extract Authors
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
The link to the solution for last challenge #35 is HERE.
Use Case: An analytical consulting company downloads medical journal publication data from the web and would like to extract all of the authors for the listed entries.
The text input contains details about each article where FAU indicates the author name for the article – in most case there are multiple authors. The details of each article are contained in lines that begin with PMID and end with an empty line.
Objective: Parse out each article PMID and list each author in sequential columns as seen in the Results.yxmd file.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Very nice use of the 'select records' tool @Naledi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Slightly different approach
Then processed the authors into a cross-tab in 2 ways (for practice)
- first was a simple ArticleAuthorID (using a multi-row formula) and then crosstab
- Second was to use a summarize to concatenate into one delimited field, then use TextToColumns to do the same as a crosstab
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I used the Summarize tool to concatenate all of the names per PMID, and then parsed back into columns with Text to Cols
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
This was fun, it would have been much easier if you could group when creating a RecordId.....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator