Feature request: Custom NER model with spacy #1453

kapilkathuria · 2018-10-09T05:10:23Z

Rasa NLU version: 0.13.4

As of now for creating a custom NER model, only option is CRF. I have multiple use cases where-in I need to extract data from documents / Pdfs. I haven't seen good results so far using CRF. I wanted to try spacy custom model.

As of now, I am creating spacy model via python script provided by spacy but I thought it will be good to get option out of box in RASA NLU to create custom spacy model.

akelad · 2018-10-09T07:53:15Z

I wouldn't really recommend using Rasa NLU to extract entities from documents/PDFs, it's for extracting entities from shorter sentences, mainly for chatbots.
I think using your own python script for this is the correct approach in this case

kapilkathuria · 2018-10-09T12:32:52Z

@akelad thanks for your note. Just in case you aware of any starting point (python script or any other NLU library) for, please share.

akelad · 2018-10-10T07:52:15Z

i'm not really sure, the spacy documentation is maybe a good starting point. I'll close this issue for now

prashant334 · 2019-05-14T13:15:00Z

dn't really recommend using Rasa NLU to extract entities from documents/PDFs, it's for extracting entities from shorter sentences, main

@akelad Even for short sentences CRF BASED "ner_crf" is not giving accurate results. I have trained model to extract source and destination from short sentences like
show me the trains from Delhi to pune where DELHI = SOURCE and PUNE=DESTINATION

please find trained data in below attachment.

data.txt

BELOW IS MY CONFIG FILE.

language: "en"

pipeline:

name: "SpacyNLP"
name: "SpacyTokenizer"
#- name: "SpacyFeaturizer"
#- name: "RegexFeaturizer"
#- name: "CRFEntityExtractor"
#- name: "EntitySynonymMapper"
#- name: "SklearnIntentClassifier"
name: "ner_crf"
name: "ner_spacy"

--

* use json.dump and json.load in count_vectors_featurizer and lexical_syntactic_featurizer instead of pickle * update load and persist in sklearn intent classifier * update persist and load in dietclassifier * update load and persist in sklearn intent classifier * use json.dump and json.load in tracker featurizers * update persist and load of TEDPolicy * updated unexpected intent policy persist and load of model utilities. * save and load fake features * rename patterns.pkl to patterns.json * update poetry.lock * ruff formatting * move skops import * add comments * clean up save_features and load_features * WIP: update model data saving and loading * add tests for save and load features * update tests for test_tracker_featurizer * update tests for test_tracker_featurizer * WIP: serialization of feature arrays. * update serialization and deserialization for feature array * remove not needed tests/utils/tensorflow/test_model_data_storage.py * start writing tests for feature array * update feature array tests * update tests * fix linting * add changelog * add new dependencies to .github/dependabot.yml * fix some tests * fix loading and saving of unexpected intent ted policy * fix linting issue * fix converting of features in cvf and lsf * fix lint issues * convert vocab in cvf * fix linting * update crf entity extractor * fix to_dict of crf_token * addressed type issues * ruff formatting * fix typing and lint issues * remove cloudpickle dependency * update logistic_regression_classifier and remove joblib as dependency * update formatting of pyproject.toml * next try: update formatting of pyproject.toml * update logging * update poetry.lock * refactor loading of lexical_syntactic_featurizer * rename FeatureMetadata.type -> FeatureMetadata.data_type * clean up tests test_features.py and test_crf_entity_extractor.py * update test_feature_array.py * check for type when loading tracker featurizer. * update changelog * fix line too long * move import of skops * Prepared release of version 3.10.9.dev1 (#1496) * prepared release of version 3.10.9.dev1 * update minimum model version * Check for 'step_id' and 'active_flow' keys in the metadata when adding 'ActionExecuted' event to flows paths stack. * fix parsing of commands * improve logging * formatting * add changelog * fix parse commands for multi step * [ATO-2985] - Windows model loading test (#1537) * Add test for model loading on windows * Improve the error message logged when handling the user message * Add a changelog * Fix Code Quality - line too long * Rasa-sdk-update (#1546) * all rasa-sdk micro updates * update poetry lock * update rasa-sdk in lock file * Remove trailing white sapce * Prepared release of version 3.10.11 (#1570) * prepared release of version 3.10.11 * add comments again in pyproject.toml * update poetry.lock * revert changes in github workflows * undo changes in pyproject.toml * update changelog * revert changes in github workflows * update poetry.lock * update poetry.lock

* Update slack release notification step * [ENG-1424] Use `pickle` alternatives (#1453) * use json.dump and json.load in count_vectors_featurizer and lexical_syntactic_featurizer instead of pickle * update load and persist in sklearn intent classifier * update persist and load in dietclassifier * update load and persist in sklearn intent classifier * use json.dump and json.load in tracker featurizers * update persist and load of TEDPolicy * updated unexpected intent policy persist and load of model utilities. * save and load fake features * rename patterns.pkl to patterns.json * update poetry.lock * ruff formatting * move skops import * add comments * clean up save_features and load_features * WIP: update model data saving and loading * add tests for save and load features * update tests for test_tracker_featurizer * update tests for test_tracker_featurizer * WIP: serialization of feature arrays. * update serialization and deserialization for feature array * remove not needed tests/utils/tensorflow/test_model_data_storage.py * start writing tests for feature array * update feature array tests * update tests * fix linting * add changelog * add new dependencies to .github/dependabot.yml * fix some tests * fix loading and saving of unexpected intent ted policy * fix linting issue * fix converting of features in cvf and lsf * fix lint issues * convert vocab in cvf * fix linting * update crf entity extractor * fix to_dict of crf_token * addressed type issues * ruff formatting * fix typing and lint issues * remove cloudpickle dependency * update logistic_regression_classifier and remove joblib as dependency * update formatting of pyproject.toml * next try: update formatting of pyproject.toml * update logging * update poetry.lock * refactor loading of lexical_syntactic_featurizer * rename FeatureMetadata.type -> FeatureMetadata.data_type * clean up tests test_features.py and test_crf_entity_extractor.py * update test_feature_array.py * check for type when loading tracker featurizer. * update changelog * fix line too long * move import of skops * Prepared release of version 3.10.9.dev1 (#1496) * prepared release of version 3.10.9.dev1 * update minimum model version * Check for 'step_id' and 'active_flow' keys in the metadata when adding 'ActionExecuted' event to flows paths stack. * fix parsing of commands * improve logging * formatting * add changelog * fix parse commands for multi step * [ATO-2985] - Windows model loading test (#1537) * Add test for model loading on windows * Improve the error message logged when handling the user message * Add a changelog * Fix Code Quality - line too long * Rasa-sdk-update (#1546) * all rasa-sdk micro updates * update poetry lock * update rasa-sdk in lock file * Remove trailing white sapce * Prepared release of version 3.10.11 (#1570) * prepared release of version 3.10.11 * add comments again in pyproject.toml * update poetry.lock * revert changes in github workflows * undo changes in pyproject.toml * update changelog * revert changes in github workflows * update poetry.lock * update poetry.lock * update pyproject.toml * update poetry.lock * update setuptools = '>=65.5.1,<75.6.0' * update setuptools = '~75.3.0' * reformat code * undo deleting of ping_slack_about_package_release.sh * fix formatting and type issues * downgrade setuptools to 70.3.0 * fixing logging issues (?) --------- Co-authored-by: sancharigr <[email protected]>

akelad closed this as completed Oct 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Custom NER model with spacy #1453

Feature request: Custom NER model with spacy #1453

kapilkathuria commented Oct 9, 2018

akelad commented Oct 9, 2018

kapilkathuria commented Oct 9, 2018

akelad commented Oct 10, 2018

prashant334 commented May 14, 2019 •

edited

Loading

Feature request: Custom NER model with spacy #1453

Feature request: Custom NER model with spacy #1453

Comments

kapilkathuria commented Oct 9, 2018

akelad commented Oct 9, 2018

kapilkathuria commented Oct 9, 2018

akelad commented Oct 10, 2018

prashant334 commented May 14, 2019 • edited Loading

prashant334 commented May 14, 2019 •

edited

Loading