A detection and configuration method for welding completeness in the automotive body-in-white panel based on digital twin | #education | #technology | #training

DT knowledge base construction technology for body-in-white panels

Constructing a DT knowledge base for body-in-white panels facilitates the detection system to analyze panel welding completeness issues from a more professional perspective. Moreover, it can provide operators with professional configuration solutions, which can be used to complete the configuration of panels and parts efficiently. Compared with textual knowledge bases composed of textual semantic data, graphical knowledge bases are more advantageous in terms of knowledge structure expression and logical reasonin32. Based on this, this paper adopts the form of a graphic knowledge base to construct a knowledge base for body-in-white panel welding completeness detection and configuration problems. As shown in Fig. 2, the visual knowledge base mainly comprises entities, entity relationships, and attributes. The E-R knowledge base relationship model is established based on attribute components such as panel type, defect type, and cause. Specifically, the panel and defect are regarded as the entity, detection as the entity-relationship, and the connection of two entities, as many-to-many relationship M: N. Among them, the panel entity contains the panel ID, virtual model, panel image and panel information, etc. The defect entity has the defect type, formation cause, defect image, solution, etc. In addition, Crow’s foot data relationship table is established based on the E-R relationship model. In this paper, we take the example of a body-in-white dash panel component and analyze what is contained under the attributes of the panel and the defective entity. First, a relational database is established based on the Crow’s foot data relationship table to form a digital twin knowledge base for body-in-white panel parts. Then, the MRTK resource package was imported into Unity 3d software, and scripts were written to call the knowledge base’s API. Lastly, the app is deployed to HoloLens2, using this knowledge base to guide the operator for configuration and repair.

Figure 2
figure 2

Construction of body in the white digital twin knowledge base.

The knowledge base guidance program can be summarized: First, the two entities, panel and defect, provide information in pictures, models, scene files, etc. The combination of information is formulated based on the E-R relationship model and presented on the HoloLens2 MR device. Then, the operator can get the ID number of the panel, the virtual model, the standard image of the panel, the type of defect, the cause, and the solution related to the defect entity of the panel in the system. Finally, the operator can obtain standardized and intelligent service from the system, which improves the efficiency of configuring panel welding completeness.

Yolov4-based welding completeness detection method for body-in-white panels

Yolov4 algorithm

This paper uses the Yolov4 target detection algorithm to detect the welding completeness of body-in-white panels. Compared with the traditional impanel matching algorithm, the former has stronger feature extraction and learning ability and more incredible detection speed and accuracy advantages.Yolov4 is a target detection algorithm proposed by Alexey Bochkovskiy et al.33, and compared to YOLOv3, the former makes optimizations in terms of the main feature extraction network, feature pyramids, and training tips34. The flow of the algorithm is shown in Fig. 3.

Figure 3
figure 3

Yolov4 network structure.

The algorithm sets the size of the image at the input. Since the down-sampling parameter is 32, the size of the input image is a multiple of 32.

The backbone network uses the CSPDarknet53 method. The method uses the Mish activation function to aggregate image features. In addition, the method optimizes the gradient information to reduce the number of model parameters and improve the model inference speed. Meanwhile, the Drop-Block regularization method extracts image features to improve the model generalization ability and prevent overfitting. Moreover, the Mish activation function is smooth. It has no positive boundary, passing the image information into the network more entirely and improving the model accuracy and generalization ability. The formula of Mish activation function is as follows:

$$y\_mish = x \times \tanh \left( {\ln \left( {1 + e^{x} } \right)} \right)$$


The neck feature pyramid is used to connect the backbone network, which uses the SPP module, and the FPN + PAN method to extract and fuse the image features and pass them to the Head prediction layer. In particular, the SPP module uses four pooling sizes of 1 × 1, 5 × 5, 9 × 9, and 13 × 13. The pooling operation increases the perceptual field and separates the most significant contextual features. The FPN method iteratively extracts the semantic and path features from the image in a top-down and bottom-up manner, respectively. The PAN instance segmentation algorithm adds bottom-up paths, fusing feature information of different sizes and output feature maps of three scales.

The head is predicted for the image features of the neck. First, a K-means clustering algorithm is used to obtain the test frame and assign it to the feature map. Then, the test frames, confidence levels, and categories are decoded in turn. Finally, the DIOU_NMS (non-maximal value suppression) algorithm filters the prediction frames that satisfy the confidence threshold and output the prediction results. Where, the prediction result = detection frame position + detection confidence + label category. the DIOU_NMS method considers both the IoU (intersection and merging ratio of detection frames to real frames) and the center distance of two detection frames to accelerate the loss function convergence speed35. The DIOU_NMS formula r is as follows.

$$S_{i} { = }\left\{ {\begin{array}{*{20}c} {S_{i} ,} & {IoU – R_{DIoU} (\mu ,B_{i} ) < \varepsilon } \\ {0,} & {IoU – R_{DIoU} (\mu ,B_{i} ) \ge \varepsilon } \\ \end{array} } \right.$$


where \(S_{i}\) denotes the confidence value of the category, \(B_{i}\) is the set of all prediction boxes in the class, \(\mu\) is the one with the highest confidence and \(B_{i}\) is the largest confidence level, and \(\mu\) is the screening threshold (artificially set)

$$R_{IoU} = \frac{{\rho^{2} (b,b^{gt} )}}{{c^{2} }}$$


where \(R_{{I{\text{o}}U}}\) denotes the canonical term of the DIoU loss function. \(\rho\) is the Euclidean distance, \(b\) and \(b^{gt}\) are the center coordinates of the two predictor frames. \(c\) refers to the diagonal length of the minimum external matrix of the two predictor frames.

Dataset generation

  1. (1)

    Image acquisition. The Yolov4 algorithm in this paper detects panel scenes acquired while the operator is wearing HoloLens2 on his head. The quality elements of the image or video, such as sharpness, focus, and noise, depending on the HoloLens2 device. The image or video sample captured by HoloLens 2 shall reflect important feature information, such as panel shape, surface hole location, nuts, bolts, locating pins, etc. Among them, the single-shot pixel of the HoloLens2 device is 8 million, and the video quality is 1080p 30fps.

  2. (2)

    Data pre-processing. First, the video samples are collected, and the frames are segmented into a single image at 5 fps intervals. Secondly, the blurred images and panel shape mutilated images are screened and eliminated. Lastly, the welding completeness of each panel is classified again. According to the size and shape of different panel holes, the samples are classified into 6 cases of nuts, bolts, locating pins with missing nuts, missing bolts, and missing locating pins. The training model adopts the data set format of VOC, including original images and label files. The number of original pictures is 1100. The data set is made and divided into the training set and test set in the proportion of 9:1. Therefore, 990 training set images and 110 test set images are finally obtained.

  3. (3)

    Data labeling. The labeling software was used to manually label the images in the training set data. The nuts, bolts, locating pins, and corresponding missing cases were marked to obtain the label file. The designed label categories are shown in Table 1.

Table 1 Label categories.

Model training

  1. (1)

    Model training. Here is the information on experimental environment: Window10, Intel(R) Core(TM) i7-8700 CPU @3.20GHZ processor, RAM 16G, graphics card NVIDIA GTX1060, Python3.7. Model parameters are set as follows. Input image size is 416 × 416. The batch is set to 4, and the label smoothing is set to 0.05. The breakpoint continuation training method is adopted. One breakpoint is set every 350 times, four breakpoints are set, 350 weight files are generated after 350 times of training, and the best weight file is manually selected as the initial weight of the next breakpoint. The total number of training times is 1400 times. During the test, the confidence is set to 0.4, and IOU is set to 0.4.

  2. (2)

    Evaluation metrics. In this paper, we mainly evaluate the effectiveness of model training in terms of detection accuracy and efficiency. The evaluation metric used is the mean Average Precision (mAP), the average detection accuracy AP of all categories, and the number of image frames per second FPS detected by the algorithm.

MR-based method for body-in-white panel welding completeness configuration

The operator is immersed in the MR environment with a HoloLens2 device on his head. The system in this paper uses a Vuforia-based virtual-real fusion approach, matched with a corresponding digital twin guidance solution, combined with human–machine interaction to achieve efficient guidance for the operator.

Virtual real registration fusion based on Vuforia

Virtual-real registration fusion technology is the key to achieving virtual and physical fusion interaction. It is mainly divided into virtual reality registration fusion based on machine vision, sensor, and hybrid virtual reality registration fusion36. In this paper, the scene is obtained by calling the hololens2 camera. Therefore, the virtual reality registration fusion technology based on machine vision is adopted. The machine vision-based virtual-real registration fusion technique is divided into two types with and without artificial markers. The anchoring methods and characteristics are shown in Table 2. The artificial marker-based virtual-real registration fusion technique requires a physical marker or a virtual marker attached to the detection target for anchoring the virtual model. The virtual model can be anchored by scanning QR codes, circular codes, or holograms. This method can achieve high registration accuracy and stability. Since the object of this paper is applied in a multi-species and multi-batch detection scenario, this method of attaching a logo code to each panel is less efficient and needs to consider situations such as easy destruction of the logo. The hologram anchoring method is a virtual-real registration fusion by matching a virtual identifier with a physical object. The method is essentially the same as the manual marker method, except that a virtual marker replaces the physical marker. In addition, the method can be adapted to mobile scenarios and small part registration scenarios. In comparison, the feature matching method with no logo has stronger robustness. However, there are also problems such as easy loss of registration targets and errors.

Table 2 Virtual reality registration method based on machine vision.

Vuforia-based virtual and real registration fusion method is adopted in this paper, considering the complex and changeable detection environment and the variety of detection panels.The method is a feature matching anchoring method without a marking method. Vuforia SDK is an AR toolkit developed by Qualcomm, and the core of its algorithm is to match and track the target by a feature point matching algorithm. The flow of the Vuforia-based virtual-real registration fusion method is shown in Fig. 4. Firstly, the standard panel image to be detected is uploaded to the Vuforia cloud database, and the data file is exported to Unity3d software. Secondly, the standard panel is set with the corresponding guide model in the digital twin knowledge base for MR spatial coordinates to determine the display position. Thirdly, the project app is deployed to HoloLens2, and the live panel images are acquired by calling the HoloLens2 camera. Finally, the image is matched with the panel inspected for feature point extraction, feature matching, and MR coordinate matching. The virtual panel digital twin knowledge base scene is overlaid with the live image to combine virtual and real registration and guide the operator in the configuration.

Figure 4
figure 4

Virtual real registration fusion process based on Vuforia.

Human–computer interaction function construction

The good human–computer interface and function settings can assist the operator in configuring the repair of defects in body-in-white welded panels. When the operator wears a HoloLens2 device on his head, he views the panel for key information through gestures, voice control, and other interactive operations. The main functions are as follows.

  1. (1)

    Basic interaction functions. By importing the Mixed Reality Toolkit (MRTK) resource package in unity3d, we designed the GUI operation panel with basic functions such as gesture operation, line of sight tracking, and voice control to meet the basic interaction operation needs of the operators.

  2. (2)

    Scene real-time acquisition. We designed the UI for taking photos, which allows the operator to take an image or video of a detected live scene by invoking the HoloLens2 camera.

  3. (3)

    Abnormal information is reported. When novice operators wear HoloLens2 equipment to test the completeness of plate parts, the batch plate parts have some abnormal conditions such as more defects and are difficult to repair, and can collect real-time scene video with the help of the camera on Hololens2. The video or picture data will be transmitted to the workshop quality control server-side. The computer will report the abnormal situation. In this way, the quality control personnel will get the batch abnormalities using voice video conversation on PC or cell phone mobile terminal, and then call the relevant personnel to dispose of the batch abnormal panels online. It should be noted that the human–computer interaction interface for abnormal information reporting includes modules such as HTTP communication connection, voice call, and video connection.

Original Source link

Leave a Reply

Your email address will not be published.

seventy seven − = seventy one