This site usescookies, tags, and tracking settings to store information that help give you the very best browsing experience. Dismiss this warning

Surgical classification using natural language processing of informed consent forms in spine surgery

Michael D. Shost Case Western Reserve University, School of Medicine;
Center for Spine Health, Neurologic Institute, Cleveland Clinic Foundation; and

Search for other papers by Michael D. Shost in
jns
Google Scholar
PubMed
Close
BS
,
Seth M. Meade Case Western Reserve University, School of Medicine;
Center for Spine Health, Neurologic Institute, Cleveland Clinic Foundation; and
Cleveland Clinic Lerner College of Medicine, Cleveland, Ohio

Search for other papers by Seth M. Meade in
jns
Google Scholar
PubMed
Close
BSE
,
Michael P. Steinmetz Center for Spine Health, Neurologic Institute, Cleveland Clinic Foundation; and
Cleveland Clinic Lerner College of Medicine, Cleveland, Ohio

Search for other papers by Michael P. Steinmetz in
jns
Google Scholar
PubMed
Close
MD
,
Thomas E. Mroz Center for Spine Health, Neurologic Institute, Cleveland Clinic Foundation; and
Cleveland Clinic Lerner College of Medicine, Cleveland, Ohio

Search for other papers by Thomas E. Mroz in
jns
Google Scholar
PubMed
Close
MD
, and
Ghaith Habboub Center for Spine Health, Neurologic Institute, Cleveland Clinic Foundation; and
Cleveland Clinic Lerner College of Medicine, Cleveland, Ohio

Search for other papers by Ghaith Habboub in
jns
Google Scholar
PubMed
Close
MD
Free access

OBJECTIVE

In clinical spine surgery research, manually reviewing surgical forms to categorize patients by their surgical characteristics is a crucial yet time-consuming task. Natural language processing (NLP) is a machine learning tool used to adaptively parse and categorize important features from text. These systems function by training on a large, labeled data set in which feature importance is learned prior to encountering a previously unseen data set. The authors aimed to design an NLP classifier for surgical information that can review consent forms and automatically classify patients by the surgical procedure performed.

开云体育世界杯赔率

Thirteen thousand two hundred sixty-eight patients who underwent 15,227 surgeries from January 1, 2012, to December 31, 2022, at a single institution were initially considered for inclusion. From these surgeries, 12,239 consent forms were classified based on the Current Procedural Terminology (CPT) code, categorizing them into 7 of the most frequently performed spine surgeries at this institution. This labeled data set was split 80%/20% into train and test subsets, respectively. The NLP classifier was then trained and the results demonstrated its performance on the test data set using CPT codes to determine accuracy.

RESULTS

This NLP surgical classifier had an overall weighted accuracy rate of 91% for sorting consents into correct surgical categories. Anterior cervical discectomy and fusion had the highest positive predictive value (PPV; 96.8%), whereas lumbar microdiscectomy had the lowest PPV in the testing data (85.0%). Sensitivity was highest for lumbar laminectomy and fusion (96.7%) and lowest for the least common operation, cervical posterior foraminotomy (58.3%). Negative predictive value and specificity were > 95% for all surgical categories.

CONCLUSIONS

利用大幅NLP用于文本分类improves the efficiency of classifying surgical procedures for research purposes. The ability to quickly classify surgical data can be significantly beneficial to institutions without a large database or substantial data review capabilities, as well as for trainees to track surgical experience, or practicing surgeons to evaluate and analyze their surgical volume. Additionally, the capability to quickly and accurately recognize the type of surgery will facilitate the extraction of new insights from the correlations between surgical interventions and patient outcomes. As the database of surgical information grows from this institution and others in spine surgery, the accuracy, usability, and applications of this model will continue to increase.

ABBREVIATIONS

AUC = area under the curve ; CPT = Current Procedural Terminology ; EMR = electronic medical record ; NLP = natural language processing ; PPV = positive predictive value ; ROC = receiver operating characteristic .

OBJECTIVE

In clinical spine surgery research, manually reviewing surgical forms to categorize patients by their surgical characteristics is a crucial yet time-consuming task. Natural language processing (NLP) is a machine learning tool used to adaptively parse and categorize important features from text. These systems function by training on a large, labeled data set in which feature importance is learned prior to encountering a previously unseen data set. The authors aimed to design an NLP classifier for surgical information that can review consent forms and automatically classify patients by the surgical procedure performed.

开云体育世界杯赔率

Thirteen thousand two hundred sixty-eight patients who underwent 15,227 surgeries from January 1, 2012, to December 31, 2022, at a single institution were initially considered for inclusion. From these surgeries, 12,239 consent forms were classified based on the Current Procedural Terminology (CPT) code, categorizing them into 7 of the most frequently performed spine surgeries at this institution. This labeled data set was split 80%/20% into train and test subsets, respectively. The NLP classifier was then trained and the results demonstrated its performance on the test data set using CPT codes to determine accuracy.

RESULTS

This NLP surgical classifier had an overall weighted accuracy rate of 91% for sorting consents into correct surgical categories. Anterior cervical discectomy and fusion had the highest positive predictive value (PPV; 96.8%), whereas lumbar microdiscectomy had the lowest PPV in the testing data (85.0%). Sensitivity was highest for lumbar laminectomy and fusion (96.7%) and lowest for the least common operation, cervical posterior foraminotomy (58.3%). Negative predictive value and specificity were > 95% for all surgical categories.

CONCLUSIONS

利用大幅NLP用于文本分类improves the efficiency of classifying surgical procedures for research purposes. The ability to quickly classify surgical data can be significantly beneficial to institutions without a large database or substantial data review capabilities, as well as for trainees to track surgical experience, or practicing surgeons to evaluate and analyze their surgical volume. Additionally, the capability to quickly and accurately recognize the type of surgery will facilitate the extraction of new insights from the correlations between surgical interventions and patient outcomes. As the database of surgical information grows from this institution and others in spine surgery, the accuracy, usability, and applications of this model will continue to increase.

Naturallanguage processing (NLP) is a powerful machine learning tool that allows automated parsing and categorization of text with numerous potential applications. NLP classifier systems are trained on an initial text data set containing words or phrases with corresponding category labels, and the trained model is then validated on novel text in a testing data set in which the labeled category is not known to the model. During the training period, the model is shown the text and the correct category and determines the relative importance of each feature in the target text for each category. When presented with unseen text data, the model conducts a comparison of feature importance for each potential class and then categorizes the input text after considering the sum of all positive and negative predictors.

Multiple studies have demonstrated the utility of NLP in medicine.13One study applied NLP to medication reviews by searching for positive or negative sentiments in patient testimonies and subsequently created a rating system based on patient experiences.4Another study used an NLP system to evaluate the free-form text from surgical trainee evaluation forms and categorize them into levels of competency.5A study from Google attempted to predict in-hospital mortality and readmission from patient records.6Furthermore, prior studies in spine surgery have leveraged this technology for automated detection of complications (such as durotomy and venous thromboembolism) and preoperative prediction of intraoperative vascular injury during anterior spine procedures, among other applications.79Given the significant flexibility of NLP systems, the vastness of healthcare data, and the ability of trained NLP systems to quickly parse text data, there are many opportunities for further implementation in healthcare.

In settings such as clinical research or surgical department analysis, increasing the efficiency of surgical data review would provide numerous benefits. A common workflow for spine surgery outcomes research may be to identify a patient population that underwent a given surgery or surgeries of interest and compare how patient or surgical factors impacted their outcome. Determining which patients should be included in the study, however, often involves manually parsing through consent forms or operative reports for pertinent details. For large retrospective studies, this may require multiple data reviewers and days of manual data review. Leveraging the power of a trained NLP model could significantly increase the efficiency of this process with minimal risk of misclassification. The time from research idea formulation to data collection would be reduced, and the barrier to conduct retrospective studies would also be reduced as a team of data reviewers would not be necessary. Furthermore, as we consider the future of predictive modeling, the ability to collect a large repository of categorized patient information, surgical information, and surgical outcomes will enable valuable insights into the factors that drive postoperative outcomes. Ultimately, this could benefit both surgeons and patients and improve preoperative surgical planning. In this study, we present our focused effort to categorize surgical cases from consent forms using an NLP classifier.

开云体育世界杯赔率

Study Design

We conducted a retrospective study of 13,268 patients at least 18 years of age who underwent 15,227 spine surgeries within the Cleveland Clinic Foundation system between January 1, 2012, and December 31, 2022. Patients were excluded if there was no text available in the consent form or no associated Current Procedural Terminology (CPT) codes for NLP dictionary creation. This study was approved as exempt human subject research and was determined to be of minimal risk by the Cleveland Clinic IRB. The IRB approved an exemption of informed consent because this study is of minimal risk and consists only of previously collected clinical data.

Consent Forms

同意书文本被利用为数据源for our NLP model to interpret and choose a surgical classification. The consent forms were presented to the model in both the training and testing data sets without any preprocessing, meaning there was a wide range of variations in consent form text for each surgery. For example, some consent forms used acronyms for procedures (TLIF), whereas others wrote the full surgery title (transforaminal lumbar interbody fusion). The spinal region, likewise, was abbreviated in some consent forms and explicitly stated in others. Additionally, some consents included blood transfusions or nonspinal procedures, such as upper endoscopy, to be performed during the same operative hospital encounter. This information was not removed from the consents, leading to a more robust NLP capable of distinguishing important predictors of spine surgery categorization. The rationale for this study design was twofold: it allows for more efficient categorization by eliminating the need for preprocessing of consent forms, and it broadens the applicability of the model to many healthcare systems regardless of their consent form’s structure.

NLP System Design

The NLP system was created using the TensorFlow open-source package for Python. Seven primary surgical operations were chosen for classification: anterior cervical discectomy and fusion, cervical laminectomy and fusion, cervical laminoplasty, cervical posterior foraminotomy, lumbar laminectomy, lumbar laminectomy and fusion, and lumbar microdiscectomy.

一旦决定了分类,最初的年代urgical classification is required to both train an NLP model and assess the correctness of its categorization. This classification was accomplished by using the most common CPT codes for these procedures at our institution. CPT codes were chosen for this application as they were found to be accurate for these procedures and allow the rapid accumulation of a large training and testing data set without manual review or data preprocessing.Figure 1demonstrates the conditional logic used to classify surgery type based on CPT codes. The aim of this logic was to identify CPT codes and combinations that would reliably reflect their corresponding surgery type. Uncommon, complex, or combination surgeries would not meet any of the criteria outlined in the conditional logic and would instead be assigned to a categorization of "other."

FIG. 1.
FIG. 1.

This conditional logic CPT classifier was used to create the initial dictionary of surgical categories. The input is the list of all CPT codes and the output is the surgery type.

Following the initial surgery class labeling by CPT code, the data set of consent form text and surgery type was split 80%/20% into training and testing subsets, respectively. Ten percent of the resulting training data was held for validation. After training and validation, the model was presented with previously unseen consent form text from the testing data set. The CPT code was hidden from the model during testing; after the NLP model’s determination of surgical category, the accuracy of this prediction was determined based on the surgery indicated by the CPT code.

To make a prediction, the model parsed the consent form into subcomponents or features. For each surgical category, the features were evaluated for their relative importance in predicting that surgery as established from the learning phase.Figure 2demonstrates a Shapley text plot of this process. An example consent form, "L4/5 transforaminal lumbar interbody fusion with posterolateral fixation and fusion," was being evaluated against each of the 7 possible surgical categories. Each surgical category weighs feature importance differently according to what was learned from the training data set. This is demonstrated inFig. 2, in which words in red represent a strong positive predictor for the selected category, blue words represent a strong negative predictor, and white or lighter shades represent weak feature importance that contributes little to the probability of categorization. The probability of correct categorization was compared for each surgery type and the category with the highest overall probability was chosen. In the example inFig. 2, the model concluded that there was a high probability that this text corresponded to the "Lumbar laminectomy and fusion" group and categorized it accordingly.

FIG. 2.
FIG. 2.

Consent form text is parsed into subcomponents with either positive (red) or negative (blue) predictive value toward a surgical categorization with the relative importance of each feature represented by the length of the segment.

Hyperparameter testing was performed to determine the optimal model configuration. The final model was constructed using the following layers: a pretrained embedding layer (nnlm-en-dim50 trained on English Google news) and one hidden layer with dropout layers for regularization.

Data Analysis

The categorization performance for each surgery type was reported with tables and figures to summarize the overall performance and a per-category breakdown. The packages used to evaluate the model included Keras with TensorFlow, scikit-learn, and Shapley packages within Python (Python Software Foundation).1014Additional statistical analysis was conducted using Microsoft Excel and RStudio.

Results

CPT Classification

After completion of the CPT classification stage, we were able to sort 12,239 surgeries into the 7 most common spine surgeries using the conditional logic outlined inFig. 1. The results of this classification were as follows: 4271 lumbar laminectomy and fusion (34.9%), 2499 lumbar laminectomy (20.4%), 1941 anterior cervical discectomy and fusion (15.9%), 1643 lumbar microdiscectomy (13.4%), 1262 cervical laminectomy and fusion (10.3%), 445 cervical laminoplasty (3.6%), and 178 cervical posterior foraminotomy (1.5%). Notably, 2988 surgeries (20%) were not able to be sorted by the CPT classifier because they had complex procedure code combinations, were less common procedures, or had incomplete billing information. As a result, these consent forms could not be used for training and were not factored into the evaluation of the model’s accuracy as there was no surgical classification available based on CPT code.

NLP Classifier Performance

The overall weighted accuracy of the NLP consent form model was 91%. The surgical category with the highest positive predictive value (PPV) was anterior cervical discectomy and fusion with 97%, whereas lumbar microdiscectomy had the lowest PPV at 85%. Sensitivity was highest for cervical laminoplasty (97%) and lumbar laminectomy and fusion (97%). A summary of the model performance for each surgical category is included inTable 1. The confusion matrix (Fig. 3) further illustrates the performance for each surgical category and provides details on false labeling. Two of the most difficult situations for the model to interpret were 1) between lumbar microdiscectomy and lumbar laminectomy, and 2) between cervical posterior foraminotomy, cervical laminectomy and fusion, and lumbar laminectomy. The micro-averaged one-versus-rest receiver operating characteristic (ROC) curve shows the performance of each surgical category in the model compared with an untrained, chance-based model (Fig. 4). By this metric, all categories achieved an area under the curve (AUC) of 0.98 or greater. The micro- and macro-averaged ROC curves consider all surgical categories and have AUCs of 1.00 and 0.99, respectively. These values were all greater than the baseline AUC of 0.5 for the chance model.

TABLE 1.

Overall NLP model performance by surgical category

Surgical Category Total (n) PPV NPV Sensitivity Specificity
Anterior cervical discectomy and fusion 388 96.8% 99.0% 94.8% 99.4%
Cervical laminectomy and fusion 252 86.7% 98.9% 90.5% 98.4%
Cervical laminoplasty 89 93.5% 99.9% 96.6% 99.7%
Cervical posterior foraminotomy 36 91.3% 99.4% 58.3% 99.9%
Lumbar laminectomy 500 83.9% 97.0% 88.6% 95.6%
Lumbar laminectomy and fusion 854 96.5% 98.2% 96.7% 98.1%
Lumbar microdiscectomy 329 85.0% 96.8% 79.0% 97.8%

NPV = negative predictive value.

The average weighted accuracy of the NLP model was 91%.

FIG. 3.
FIG. 3.

Confusion matrix for our NLP surgical classification model showing classification results by surgery with the true label as determined by CPT code on the y-axis and the predicted label output from the NLP model on the x-axis.

FIG. 4.
FIG. 4.

The one-versus-rest ROC curve demonstrates the performance of our NLP model for each surgery type compared with a model based on chance. Micro- and macro-averaged ROC curves for all surgical categories are included. Performance is estimated from the AUC. Ant. = anterior; Cerv. = cervical; Disc. = discectomy; Lami., Lamin. = laminectomy; Lum. = lumbar; Post. = posterior.

Hyperparameter optimization was achieved with hand tuning of the model. A summary of the parameter testing results is included inTable 2. The model’s accuracy and loss were plotted for both training and validation data inFig. 5. The optimal number of epochs (an epoch is when the data set is passed through the given algorithm one time) and training length were determined using the "early stopping" callback to monitor validation loss. After analyzing the convergence of validation and training loss and balancing the minimization of loss and maximation of accuracy, model 3 was selected as the final model for this project.

TABLE 2.

Hyperparameter tuning results for the NLP model

Model No. Fully Connected Layer (Neurons) Best Epoch Training Validation
1 2 3 4 Loss Accuracy Loss Accuracy
1 64 1033 0.206 0.934 0.265 0.923
2 128 784 0.206 0.934 0.225 0.920
3 256 800 0.192 0.938 0.223 0.923
4 256 128 649 0.241 0.923 0.231 0.919
5 512 512 575 0.214 0.930 0.224 0.921
6 512 512 256 578 0.243 0.921 0.222 0.922
7 512 512 512 576 0.230 0.924 0.225 0.917
8 256 256 256 256 713 0.273 0.914 0.238 0.915
9 512 512 256 256 684 0.241 0.920 0.230 0.915
10 512 512 512 512 601 0.234 0.921 0.221 0.918
FIG. 5.
FIG. 5.

The accuracy (left) and loss (right) from model training and validation are demonstrated.

Discussion

NLP has increased in popularity in recent years as its power and flexibility continue to be demonstrated in cases of varied use.3,15,16尤其time-intensiv手工图审查e task in the research process for most clinical studies, and having multiple data reviewers with varying levels of training often leads to heterogeneous data collection. A standardized machine learning approach would streamline this process and assist institutions without large databases or significant teams of data reviewers to quickly produce reliable research information from clinical data. This study describes a strategy to simplify data acquisition through NLP modeling. By training an NLP system with the data available in the electronic medical record (EMR), we were able to demonstrate 91% overall accuracy in identifying an operation without any user input or specially created dictionary.

The CPT classifier demonstrated inFig. 1was used to build the surgical dictionary that trained our NLP model and demonstrates an easy and approachable way to quickly categorize data into common surgical categories. This approach relies on CPT codes to make surgical assignments and is a great tool for surgical trainees and institutions to track surgical volume of most common spine surgeries. CPT codes are sufficiently accurate for common operations that do not include modifiers or multiple surgical techniques. For this reason, it is a trustworthy resource for training an NLP model on the language for each surgical category. However, approximately 20% of the surgeries at our institution have complex or incomplete CPT coding that was not able to be interpreted by a CPT classifier alone. Our NLP model is one possible solution for this problem because it has already been trained on categorized data and can be deployed to interpret this "other" category without any additional input.

The ability of the NLP model to correctly interpret and categorize surgical information will continue to increase as more data and categorizations are added to the training dictionary. For example, while all lumbar fusion surgeries are currently categorized together under lumbar laminectomy and fusion, additional data would allow us to create separate categorizations for operations such as anterior lumbar interbody fusions, lateral lumbar interbody fusions, and transforaminal lumbar interbody fusions. With this level of detail, surgeons would be able to stratify their previous patients by surgical approach to compare how similar patients recovered after surgery, and this information may inform how they counsel a patient on their likely postoperative outcome or surgical options during a preoperative clinic visit. As additional data sources such as operative reports are added, it will be possible to expand beyond surgical classifications to surgical information such as the number of operative levels and what operation was performed at each level in complex multilevel multitechnique procedures. This level of detail will never be possible with the CPT system as presently constructed and demonstrates an area of growth and opportunity for NLP and machine learning technology.

Strengths of the Study

This work represents a dramatic improvement in efficiency for a routine and simple task performed in retrospective spine surgical research, the assessment of fellows’ caseloads, and other important applications. The performance of the most common surgical procedure in our database, lumbar laminectomy and fusion, was the most accurate result, which suggests that the accuracy of an NLP model will continue to improve as the dictionary of surgical data continues to expand. Additionally, much of the error in this model came from slight differences in wording, such as identifying a laminectomy and fusion, not from choosing the wrong spinal level or classifying a microdiscectomy as a laminectomy. These errors illustrate clear areas to target for further refinement that will result in even greater accuracy rates. Notably, NLP models are robust for different text formats, meaning that this model would behave similarly for consent forms across institutions. Any text that can be extracted from an EMR and presented to the NLP algorithm could be interpreted as is or used to continue to train and enhance the accuracy of the model.

Limitations of the Study

Misclassification affected approximately 9% of the data. Part of this misclassification was due to the occasional inaccuracy of CPT codes. While this project endeavored to only train on the most clearly defined CPT codes, there were still instances of inappropriate or inaccurate coding that introduced error to the model. For applications in which higher accuracy is needed, misclassification could be improved by manually reviewing all text and assigning surgical categories in the training and testing data set. However, for most research applications, the accuracy achieved by this model will result in the overwhelming majority of surgeries being appropriately categorized, with manual review reserved for the small group of misclassified or more complex cases to ensure high accuracy.

While the model demonstrated high accuracy overall, there were patterns in the categorizations that may be opportunities for future improvement. Specifically, the model demonstrated difficulty in discerning between lumbar microdiscectomies and lumbar laminectomies, as well as between cervical posterior foraminotomy and cervical laminectomy. The consent forms for these groups were found to contain many overlapping terms related to the decompression and offer a clear example in which additional information on approach and the extent of an operation from the operative report could reduce misclassification.

Another inherent limitation in this study design is that consent forms may not fully correspond with the surgery performed in the event that the surgical plan changes intraoperatively. Incorporation of more definitive data, such as the operative report, would reduce this source of potential misclassification and allow for the extraction of additional surgical information, but this project demonstrates that consent form data alone are adequate for accurate and efficient surgery identification. Not all systems may initially benefit from our method, such as those with written consent forms or without EMRs. However, with the continued expansion of EMRs and software tools such as optical character recognition that process handwritten documents into digital formats, we believe that the utility of machine learning and NLP models will be available to many hospital systems.

Future Directions

The advancement of NLP systems relies on increasing the amount of surgical information per patient as well as increasing the amount of patient data in the training data set. This can be accomplished by contributing additional data sources, most notably the operative report, as well as collaborating with other health systems to build a large, diverse repository of deidentified patient data. With the near-universal use of electronic health records, healthcare systems across the country and the world possess vast amounts of patient data. While this amount of data quickly becomes overwhelming for any human reviewer to examine and parse through, machine learning offers the opportunity to extract novel insights that can improve surgical care and advance the scientific process.

The application of this system in research would allow investigators to rapidly proceed from idea formulation to data collection on patients of interest. In the clinical setting, this model would allow surgeons to quickly overview their operative practice and review patient outcomes by surgical type. Training with additional data sources, such as operative reports, would also allow for the collection of complication data, operative approach, and surgical level, to name a few possibilities. In the era of probabilistic models, this amount of granular data is valuable for the creation of predictive systems that can analyze prior surgeries along with preoperative factors and postoperative results to produce insights that can benefit surgical planning and patient expectations. By considering all of a given patient’s preoperative factors and the outcomes of similar patients who underwent different procedures, an advanced predictive model would be able to provide recommendations based on the data gathered from an NLP model.

Conclusions

Processing surgical information from healthcare data sources is an ideal task for a machine learning application, and NLP will dramatically increase the efficiency of research and surgical data review. This project demonstrates the rapid ability of an NLP algorithm to parse, learn, and output surgical information from consent forms. Continued implementation of this work will enable faster and more accurate evaluation of surgical spine procedures.

Disclosures

斯坦梅茨博士报道从子获得版税mmer/Biomet, Elsevier, and Globus, and an honorarium from Cerapedics outside the submitted work. Dr. Mroz reported receiving royalties from Stryker outside the submitted work.

Author Contributions

概念和设计:Habboub Shost。收购of data: Habboub, Shost. Analysis and interpretation of data: all authors. Drafting the article: Shost, Meade, Mroz. Critically revising the article: Habboub, Meade, Mroz. Reviewed submitted version of manuscript: all authors. Statistical analysis: Habboub, Shost. Administrative/technical/material support: Meade.

References

  • 1

    EyreH,ChapmanAB,PetersonKS,et al.Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python.AMIA Annu Symp Proc.2022;2021:438-447.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 2

    López-ÚbedaP,Martín-NoguerolT,JuluruK,LunaA.Natural language processing in radiology: update on clinical applications.J Am Coll Radiol.2022;19(11):12711285.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 3

    FilanninoM,UzunerÖ.Advancing the state of the art in clinical natural language processing through shared tasks.Yearb Med Inform.2018;27(1):184192.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 4

    HarrisonCJ,Sidey-GibbonsCJ.Machine learning in medicine: a practical introduction to natural language processing.BMC Med Res Methodol.2021;21(1):158.

    • Search Google Scholar
    • Export Citation
  • 5

    ÖtleşE,KendrickDE,SolanoQP,et al.Using natural language processing to automatically assess feedback quality: findings from 3 surgical residencies.Acad Med.2021;96(10):14571460.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 6

    RajkomarA,OrenE,ChenK,et al.Scalable and accurate deep learning with electronic health records.NPJ Digit Med.2018;1(1):18.

  • 7

    KarhadeAV,BongersMER,GrootOQ,et al.Development of machine learning and natural language processing algorithms for preoperative prediction and automated identification of intraoperative vascular injury in anterior lumbar spine surgery.Spine J.2021;21(10):16351642.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 8

    KarhadeAV,BongersMER,GrootOQ,et al.Can natural language processing provide accurate, automated reporting of wound infection requiring reoperation after lumbar discectomy?Spine J.2020;20(10):16021609.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 9

    HuangBB,HuangJ,SwongKN.Natural language processing in spine surgery: a systematic review of applications, bias, and reporting transparency.World Neurosurg.2022;167:156164.e6.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 10

    GéronA.Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems.2nd ed.O’Reilly;2019.

    • Search Google Scholar
    • Export Citation
  • 11

    CholletF.Keras.Accessed April 5, 2023.https://github.com/fchollet/keras

  • 12

    TensorFlow.Zenodo.Published online May 23, 2022. Accessed April 5, 2023.https://doi.org/10.5281/zenodo.6574269

  • 13

    PedregosaF,VaroquauxG,GramfortA,et al.Scikit-learn: machine learning in Python.J Mach Learn Res.2011;12(85):28252830.

    • Search Google Scholar
    • Export Citation
  • 14

    GilliesS.Shapely: manipulation and analysis of geometric objects.Accessed April 5, 2023.https://github.com/Toblerity/Shapely

  • 15

    CoppersmithG,LearyR,CrutchleyP,FineA.自然语言处理的副部l media as screening for suicide risk.Biomed Inform Insights.2018;10:1178222618792860.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 16

    HuangM,BuchholzA,GoyalA,et al.Impact of surgeon and hospital factors on surgical decision-making for grade 1 degenerative lumbar spondylolisthesis: a Quality Outcomes Database analysis.J Neurosurg Spine.2021;34(5):768778.

    • Search Google Scholar
    • Export Citation
  • Collapse
  • Expand

Illustration portraying neurosurgeon’s hands performing microanastomosis under an operating microscope, with continuous tracking of 21 hand landmarks by a supervising machine learning algorithm. Against a backdrop of darkness, an auspicious light radiates at the center, symbolizing the unknown potential that artificial intelligence holds for the future of neurosurgery. Artist: Aaron Cole, MS. Used with permission from Barrow Neurological Institute, Phoenix, Arizona. See the article by Gonzalez-Romo et al. (E2).

  • This conditional logic CPT classifier was used to create the initial dictionary of surgical categories. The input is the list of all CPT codes and the output is the surgery type.<\/p><\/caption>"}]}" aria-selected="false" role="option" data-menu-item="list-id-f5b37941-f3fc-4da8-b90c-6af45bd03c63" class="ListItem ListItem--disableGutters ListItem--divider">

    FIG. 1.

    This conditional logic CPT classifier was used to create the initial dictionary of surgical categories. The input is the list of all CPT codes and the output is the surgery type.

  • Consent form text is parsed into subcomponents with either positive (red<\/em>) or negative (blue<\/em>) predictive value toward a surgical categorization with the relative importance of each feature represented by the length of the segment.<\/p><\/caption>"}]}" aria-selected="false" role="option" data-menu-item="list-id-f5b37941-f3fc-4da8-b90c-6af45bd03c63" class="ListItem ListItem--disableGutters ListItem--divider">

    FIG. 2.

    Consent form text is parsed into subcomponents with either positive (red) or negative (blue) predictive value toward a surgical categorization with the relative importance of each feature represented by the length of the segment.

  • Confusion matrix for our NLP surgical classification model showing classification results by surgery with the true label as determined by CPT code on the y-axis and the predicted label output from the NLP model on the x-axis.<\/p><\/caption>"}]}" aria-selected="false" role="option" data-menu-item="list-id-f5b37941-f3fc-4da8-b90c-6af45bd03c63" class="ListItem ListItem--disableGutters ListItem--divider">

    FIG. 3.

    Confusion matrix for our NLP surgical classification model showing classification results by surgery with the true label as determined by CPT code on the y-axis and the predicted label output from the NLP model on the x-axis.

  • The one-versus-rest ROC curve demonstrates the performance of our NLP model for each surgery type compared with a model based on chance. Micro- and macro-averaged ROC curves for all surgical categories are included. Performance is estimated from the AUC. Ant. = anterior; Cerv. = cervical; Disc. = discectomy; Lami., Lamin. = laminectomy; Lum. = lumbar; Post. = posterior.<\/p><\/caption>"}]}" aria-selected="false" role="option" data-menu-item="list-id-f5b37941-f3fc-4da8-b90c-6af45bd03c63" class="ListItem ListItem--disableGutters ListItem--divider">

    FIG. 4.

    The one-versus-rest ROC curve demonstrates the performance of our NLP model for each surgery type compared with a model based on chance. Micro- and macro-averaged ROC curves for all surgical categories are included. Performance is estimated from the AUC. Ant. = anterior; Cerv. = cervical; Disc. = discectomy; Lami., Lamin. = laminectomy; Lum. = lumbar; Post. = posterior.

  • The accuracy (left<\/strong>) and loss (right<\/strong>) from model training and validation are demonstrated.<\/p><\/caption>"}]}" aria-selected="false" role="option" data-menu-item="list-id-f5b37941-f3fc-4da8-b90c-6af45bd03c63" class="ListItem ListItem--disableGutters ListItem--divider">

    FIG. 5.

    The accuracy (left) and loss (right) from model training and validation are demonstrated.

  • 1

    EyreH,ChapmanAB,PetersonKS,et al.Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python.AMIA Annu Symp Proc.2022;2021:438-447.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 2

    López-ÚbedaP,Martín-NoguerolT,JuluruK,LunaA.Natural language processing in radiology: update on clinical applications.J Am Coll Radiol.2022;19(11):12711285.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 3

    FilanninoM,UzunerÖ.Advancing the state of the art in clinical natural language processing through shared tasks.Yearb Med Inform.2018;27(1):184192.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 4

    HarrisonCJ,Sidey-GibbonsCJ.Machine learning in medicine: a practical introduction to natural language processing.BMC Med Res Methodol.2021;21(1):158.

    • Search Google Scholar
    • Export Citation
  • 5

    ÖtleşE,KendrickDE,SolanoQP,et al.Using natural language processing to automatically assess feedback quality: findings from 3 surgical residencies.Acad Med.2021;96(10):14571460.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 6

    RajkomarA,OrenE,ChenK,et al.Scalable and accurate deep learning with electronic health records.NPJ Digit Med.2018;1(1):18.

  • 7

    KarhadeAV,BongersMER,GrootOQ,et al.Development of machine learning and natural language processing algorithms for preoperative prediction and automated identification of intraoperative vascular injury in anterior lumbar spine surgery.Spine J.2021;21(10):16351642.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 8

    KarhadeAV,BongersMER,GrootOQ,et al.Can natural language processing provide accurate, automated reporting of wound infection requiring reoperation after lumbar discectomy?Spine J.2020;20(10):16021609.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 9

    HuangBB,HuangJ,SwongKN.Natural language processing in spine surgery: a systematic review of applications, bias, and reporting transparency.World Neurosurg.2022;167:156164.e6.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 10

    GéronA.Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems.2nd ed.O’Reilly;2019.

    • Search Google Scholar
    • Export Citation
  • 11

    CholletF.Keras.Accessed April 5, 2023.https://github.com/fchollet/keras

  • 12

    TensorFlow.Zenodo.Published online May 23, 2022. Accessed April 5, 2023.https://doi.org/10.5281/zenodo.6574269

  • 13

    PedregosaF,VaroquauxG,GramfortA,et al.Scikit-learn: machine learning in Python.J Mach Learn Res.2011;12(85):28252830.

    • Search Google Scholar
    • Export Citation
  • 14

    GilliesS.Shapely: manipulation and analysis of geometric objects.Accessed April 5, 2023.https://github.com/Toblerity/Shapely

  • 15

    CoppersmithG,LearyR,CrutchleyP,FineA.自然语言处理的副部l media as screening for suicide risk.Biomed Inform Insights.2018;10:1178222618792860.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 16

    HuangM,BuchholzA,GoyalA,et al.Impact of surgeon and hospital factors on surgical decision-making for grade 1 degenerative lumbar spondylolisthesis: a Quality Outcomes Database analysis.J Neurosurg Spine.2021;34(5):768778.

    • Search Google Scholar
    • Export Citation

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 233 233 60
PDF Downloads 246 246 75
EPUB Downloads 0 0 0
Baidu
map