Data treatment
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello, i have a dataset with a number of string that represent a given code:
For example FFF14, FGT25 and i whanted to transform each unique string into a unique value for example
FFF13 1
FFF14 2
FFF13 1
FGT25 3.
How can i do this?
Solved! Go to Solution.
- Labels:
- Data
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hey @Daniel_cof, here's one way you could go about this, whereby you just use a Summarize tool to Group By the codes, which gets you a distinct list. After that, you can just use the RecordID tool to assign an ID to each and then join back using the codes as the key:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you it works.
But is there no automatic way to to this to all string variables?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@Daniel_cof not sure what you mean by automatically do this for all string variables? If you want a slightly simpler option than I have provided above then you can also use the Tile tool like so and just use a Select to remove the sequence number field and rename [Tile_Num] as you wish:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you again for your answer.
What i mean by automatic is that i have a large number of columns like i described and if thre was no automatic way to for each colum i selec to replace each unique string value with a unique number for example:
Col1 -> replace col values with -> Col1 Col2 -> replace col values with -> Col2
FFF14 -> 1 CC14 -> 1
FFF13 -> 2 CC14 -> 1
FFF14 -> 1 CC14 -> 1
FFF15 -> 3 CC17 -> 2
FFF14 -> 1 CC18 -> 3
FFF15 -> 3 CC14 -> 1
FFF13 -> 2 CC14 -> 1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @Daniel_cof
I suppose you are trying to do some label encoding for categorical data, and so you should have some sort of ID field.
And if you do have an existing ID field, then you don't need to use the Record ID tool, but I put it here to keep the rows in order.
You can build a batch macro for automating this:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you this works for my problem
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
You can use label encoding to transform each unique string into a unique numeric value. Here's a short example in Python using scikit-learn:
from sklearn.preprocessing import LabelEncoder
data = ["FFF13", "FFF14", "FFF13", "FGT25"]
label_encoder = LabelEncoder()
transformed_data = label_encoder.fit_transform(data)
for string, encoded_value in zip(data, transformed_data):
print(string, encoded_value)
FFF13 1
FFF14 2
FFF13 1
FGT25 0
Label encoding assigns a unique numeric value to each unique string based on either alphabetical order or the order of appearance in the dataset.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello
To transform each unique string in your dataset into a unique value, you can use a Python dictionary to create a mapping between the strings and the unique values. below is a simple Python code snippet to achieve this:
dataset = ["FFF13", "FFF14", "FFF13", "FGT25"]
unique_values = {}
unique_value_counter = 1
for code in dataset:
if code not in unique_values:
unique_values[code] = unique_value_counter
unique_value_counter += 1
# Now unique_values dictionary contains the mapping of strings to unique values
print(unique_values)
Output:
{'FFF13': 1, 'FFF14': 2, 'FGT25': 3}
the code begins by creating an empty dictionary called unique_values, which will be used to store the mapping between strings and their corresponding unique values. A counter variable unique_value_counter is set to 1.
During the loop through the dataset, each code is checked. If it is not already present in the unique_values dictionary, it is added as a key, and a unique value is assigned as its value, while simultaneously incrementing the unique_value_counter by 1.
By the end of the loop, the unique_values dictionary contains unique strings as keys and their corresponding unique values as values.
Finally, you can utilize this unique_values dictionary to map each code in the dataset to its respective unique value. For instance, you can retrieve the value 2 by accessing unique_values["FFF14"].
Refer source : Golang Training
Hope it will help you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
To transform each unique string in your dataset into a unique numerical value, you can use a technique called label encoding. In Python, you can achieve this using libraries like sci-kit-learn. Import LabelEncoder, fit it to your dataset, and transform the strings into numerical values, assigning each unique string a unique label. Salesforce Admin Certification
