This is a continuation of a few posts involving parsing sentences (from a body of text).
The issue is identifying the whole sentence when abbreviations with periods may be present.
I am exploring a combination of lookup tables or dynamic approaches using Python + a relevant library (e.g., NLTK and SpaCy)
Thanks to @danilang I was able to test NLTK using the Python tool, unfortunately stock NLTK had difficulty identifying sentences.
So, I am off trying a new suggested approach that once again is beyond my liberal arts skills. This time I want to try SpaCy python Sentence Boundary Disambiguation (PySBD) project. It appears to be better suited than NLTK to handle edge cases scenarios (e.g., U.S., Mass., Co., plc.) .
Two questions...
@hellyars - what version of AYX Designer do you have? See Help > About - Depending on the version, you can check the file paths listed here - https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/How-To-Use-Alteryx-installPackages-...or follow "Procedure: List the Currently Installed Modules"
%ALTERYX%\bin\Miniconda3\PythonTool_venv\Lib\site-packages until 2019.2
%ALTERYX%\bin\Miniconda3\envs\JupyterTool_vEnv\Lib\site-packages for 2019.3.1 to 2021.1.3
%ALTERYX%\bin\Miniconda3\envs\DesignerBaseTools_vEnv\Lib\site-packages for 2021.1.4+
You can install python packages via command prompt (run as admin) - https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Install-Python-packages-via-command...
For 2021.1.4+
Activate DesignerBaseTools_venv:
from ayx import Alteryx
import re
from pandas import DataFrame
import io
from contextlib import redirect_stdout
with io.StringIO() as current_output, redirect_stdout(current_output):
Alteryx.installPackages(package='',install_type='freeze')
packages = ( (item for item in out_row.split("=") if item)
for out_row in re.split(string=current_output.getvalue(),pattern=r"\r*\n") if out_row)
output_df = DataFrame(packages ,columns=["package","version"])
Alteryx.write(output_df,1)
@JessieC Thanks. I will have to try this out -- but it might take a bit, this isn't exactly my cup of tea.
User | Count |
---|---|
106 | |
82 | |
70 | |
54 | |
40 |