This is a continuation of a few posts involving parsing sentences (from a body of text).
The issue is identifying the whole sentence when abbreviations with periods may be present.
I am exploring a combination of lookup tables or dynamic approaches using Python + a relevant library (e.g., NLTK and SpaCy)
Thanks to @danilang I was able to test NLTK using the Python tool, unfortunately stock NLTK had difficulty identifying sentences.
So, I am off trying a new suggested approach that once again is beyond my liberal arts skills. This time I want to try SpaCy python Sentence Boundary Disambiguation (PySBD) project. It appears to be better suited than NLTK to handle edge cases scenarios (e.g., U.S., Mass., Co., plc.) .
Two questions...
- I ran the following code in the Python tool to install SpaCy (Alteryx.installPackages("spacy")). It returned a Call Process Error exactly the same as this post. I ran the tool again today and it states "Requirement already satisfied." So, how can I confirm SpacY is actually installed correctly?
- How do you install the pySBD project from the Alteryx Python tool? It does not appear to be included in the vanilla SpaCy installation. I tried import pysbd, import pysbd.utils, from pysbd.utils import PySDBFactory, import pySBDFactory, etc. They all generate a ModuleNotFoundError.