Attachment:1
CHAPTER 1
INTRODUCTION
1.1. Background
Bill Hewlett and Dave Packard graduated with degrees in electrical engineering from Stanford University in 1935. In 1937, Bill and Dave formalize their partnership. They decide the company’s name with a coin toss. In 1999, HP announces strategic realignment to create an independent measurement company composed of test and measurement components, chemical analysis and medical businesses, and a computing and imaging company that includes all of HP’s computing, printing and imaging businesses.
Agilent Technologies, the name of the new measurement company, is announced at historic brand-identity launch event in San Jose, Calif., announced by Agilent President and Chief Executive Officer Ned Barnholt. (KeysightTech, 2019) In 2013 Agilent Technologies announces that it will split into two separate pure-play measurement companies. The name of the new electronic measurement company is announced later in the year as Keysight Technologies.
During 2014 the separation process continues and on November 1 Keysight Technologies becomes a fully separate electronic measurement company. On November 3, 2014, Keysight lists on the New York Stock Exchange, under ticker symbol KEYS, completing the final phase of its separation from Agilent.
1.2. Problem Background
Keysight offers a variety of products which includes hardware and software. One of the few instruments that Keysight are selling are multimeters, signal analyzes, atomic force microscope, power suppliers and handheld tools. Besides that, Keysight is serving the aerospace & defense, telecommunications, automotive & energy and semiconductor industry. Keysight Technologies is the world’s leading electronic measurement company.
Most product in Keysight undergo 100% testing using automated test. All tests are executed and deemed passed only if all tests are passed. This way the test process is straightforward, easy to administer and should ensure zero percent defect. The drawback of this test process is that it takes time to do 100% testing. The cost of testing will be evident for high volume product with long test time. The test results of units tested in manufacturing contains lots of measurement, process and specification data. After testing, the result is stored in raw data format using test executive software. Data analysis technique has the potential to uncover new insights from test, looking at data from a different angle than the traditional SPC approach. Product may fail for many reasons which are often not obvious or difficult to surface without tedious statistical analysis on the test results.
1.3. Problem Statement
Whenever there is any yield issue in the production line, nothing could be more frustrating for the test engineers to know that the key to unlock a particular problem is available, but that it is lost somewhere in a mountain of data files or databases. Data collection is usually carried out to compile the result of testing over different time period, over different batches or under different test conditions. With shorter test development cycle, test engineer typically use the data in the manufacturing test and perform only basic filtering and simple analysis. As a result, efforts to shorten the test time often result in only incremental speed improvement. Engineers also use more time to troubleshoot the product quality issues due to data overload. To ramp up the volume production of a newly launched product, it is often easiest to allocate more engineers or setup more stations to test in parallel.
In manufacturing, the production operation faces the challenge of Optimizing Inefficient Process. Test process named “AC20GHz_Sequences.TrigMisc”. About 30% of the unit required at least 3 rounds of run, each run consume 2 hours to complete the test. This is due to some failures was found during the test and whenever a failure point is found, the test will automatically be stopped. This is followed by sending the DUT to rework station and queue for rework. Once rework is done, the DUT will send back to test station and resume the test from the last failure point. The process of sending for rework will be repeated when the next failure point is met.
This repeated activity in between test station and rework station is part of the production waste as it creates lots of DUT movement to-and-back from rework station and DUT queuing time at rework station. According to LEAN definition these waste as categorized as the waste of motion and the waste of waiting.
1.4. Objectives of Project
Objectives
• Design & identify the Failure relationship in the test using Association Rules & Decision Tree
1.5. Benefit of Project
The test and calibration time will be reduced, when the failure the detected early, means the cycle time to produce a new equipment will be reduced too. The annual test volume of test on year 2018 is around 2000 runs. Lower cycle time will directly reduce the production cost of a product and increase the net profit of a product. The Table 1.1 illustrates the ROI of the Project.
Annual Unit 2222
Monthly Unit 186
Monthly Paid for Operator 882
1 round Test Time in Hour 2
Total Test Time removed in a month(by day) 48
Total Test Time removed in a month(by month) 2
Dollar saved in a month 1949
Annually Saved USD 23,384
Table 1.1 ROI of the Project
Assume if this approach successfully deployed and reduce 1 round of retest, it can help to save up to USD 23,384. In addition, capture and digitalize the pattern of the test failure information into a model using data driven approach. Transform from manual to machine learning model for test engineers to get the insights of test failure relationship in a test process.
1.6. Research Questions
The research questions are:
1. What are the pattern of the failure relationship in the test process.
2. What is the relationship among the test process.
The research objectives are:
• To identify the pattern of the failure relationship in the test
• To discover the failure pattern by using the Machine Learning
Attachment:2
CHAPTER 2
RELATED WORKS
2.0 . Introduction
In this chapter is to evaluate the available literature in the given domain which will cover the existing tools and analytical technique in the domain.
2.1. Literature Review
Optimal planning of an industrial manufacturing system, anticipating failures can be considered an insight (Khan, Schioler, Kulahci, & Peter, 2019) Productivity is one of three basic elements that manufactures are seeking along with cost and quality.
Manufacturing try to go beyond preventive maintenance in order to enable prescriptive maintenance systems. Downtime is critical to drive productivity and overall efficiency of industrial equipment and machinery. Predict failure analysis is to predict potential problem with system or application. Its extend availability by going beyond failure detection to predict the failure before occur. (IBM, 2019)
There are several journal is being reviewed regarding the minig associate rules able to improved manufacturing productivity as it is important to know if the sequence of failure able to detected during usage or from historical data. (Kumar & Selvadoss, 2013). According to Unchalisa Taetragool proprosed that design failure pattern analysis and solve problems in the domain of manufacturing quality improvement. The second study by Apte, Wiess, and Grout (1993) employed 5 methods to predict defects in hard drive manufacturing.
2.2 . Data Science & Analytics Technique
2.2.1. Decision Tree
Decision Tree is a type of supervised learning algorithm that widely in used for classification problem. It is a decision support tool, a tree-like graph of a model of decisions and the consequences, including the chance event outcomes, resource costs and so on. Tree based method allow predictive models with high accuracy, stability and ease of interpretation. (Brid, 2019)
Application for decision tree have a natural “if.. then.. else”, this construction makes it fit easily into programmatic structure. It also ideal for categories problem where the attributes or features are systematically evaluated to determine a final category.
It has two types of decision tree which is Categorical Variable Decision Tree and Continuous Variable Decision Tree. Continuous variable has the continuous target variable while the Categorical has the target variable such as “FAIL” or “PASS”. The figure below illustrates a problem to predict if the customer will pay the renewal premium insurance company (YES/NO).
The basic terminology related to the Decision Tree are Root Node, Splitting, Decision Node, Leaf/Terminal Node, Pruning, Branch/Sub-Tree and Parent and Child Node. The advantage of Decision Tree does not require normalization of data while disadvantage of the decision tree require longer time to train the model. (K, 2019)
There are several literatures have mentioned data classification application in manufacturing. Wei-Choi C. proposed data mining solution for discovering the root cause of low-yield situation.
2.2.2. Associate Rules
Associate Rule is a rule-based machine learning method to insight the interesting relationship between variable. There are if-then statement that help to show the probability of relationships between data item. In associate rule mining, helps analyze data for pattern or co-occurrence in database. It evaluates frequent if-then associations.
There are two parts which is an antecedent(if) and consequent (then). An antecedent, an item found consequent within the data. (Rouse, n.d.). Elisa had discovered the data mining such as associate rules and decision trees are used to determine the cause of failures in fabrication process.
Elisa had discovered the data mining such as associate rules and decision trees are used to determine the cause of failures in fabrication process. Furthermore, the use of association rules mining in frequent patterns captured from industrial processes can provide useful knowledge to explain industrial failures (Martínez-de-Pisón, Sanz, Martínez-de-Pisón, Jiménez, & Conti, 2012).
2.3 . Literature Review on Analytical Tools
2.3.1 R Studio
R is a programming language and open source software for statistical computing and graphics supported by R Foundation. R is widely in used for statistical and data miner for developing the data analysis. In 1976, the R is created by Ross Ihaka and Robert Gentlemen at University of Auckland. (R Programming, 2019).
R Studio makes R Programming to ease to use, it includes the code editor, debugging features and visualization tools as well. It supports the file format of Txt, Excel, SPSS, SAS, Stata. In addition, R Studio able to integrate support of Git, makes user more convenient to access their workspace.
The Figure 2 illustrates the R Studio screen that are total 4 panel work space for 1. Edit and create the file containing R script 2. Key in the input of R commands 3. Trace back the command history 4. Plot or graph visualization. (STHDA, 2019)
Figure 5 2 R Studio Screen
R is cross platform compatible, able to install on Windows, MAC OSX and Linux as well. It has thousands of documented extensions, the R package to work on.
Attachment:2
CHAPTER 3
RESEARCH METHODOLOGY
3.1 Introduction
In this project, discover failure pattern is designed and developed to forecast which test point has the highest failure rate. Discover Test Failure analysis provides insight to user with the expected which test intent to be failed. This Chapter pertaining the selected methodology and activities plan of the project.
3.2 Research Framework
The Figure 3.1 illustrates the research framework of Discover Test Failure Pattern.
Above figure demonstrates the workflow of the project. The project is undergo production line for specific product family and model. Data acquired for FY17’ Q4 to FY 18’ Q3 for 1 year historical data.
3.3. Activities Plan & Project Gantt Chart
In this project, there are consists of four phases which are Introduction, Literature Review, Research Methodology, Result and Discussion and Conclusion where listed in Table 3.1. In addition, the Gantt Chart is shown in Figure 3.2.
Table 3.1: Activities Plan
Phase Task
Introduction & Literature Review
i. Background
a. Domain/context
ii. Problem Statement
iii. Objective of project
iv. Benefit of project
v. Review on relevant literature on:
a. Domain/context
b. Data Science and Analytical techniques
c. Data Science and Analytical tools
Research Methodology
i Activities plan and Gantt chart
ii Data Science project lifecycle
iii Data acquisition and data exploratory analysis
Results and Discussion
i Justification of selected DSA technique
ii Justification of selected DSA analytical tool
iii Challenges and Solutions
iv Discussion and Validation on project outcomes
Conclusion
i Conclusion
ii Future work
Figure 3.1: Gantt Chart
1.2. Data Collection
Data Source
The Data Source in used for the current project consists of Multiple Station, Product Family, Model, Option, Results of Test Sequence, Test Point Name, Duration completion time. The historical data for 1 year between FY17 to FY18 has been collected for this project.
The data is required to query from the database in order to get the result file in xml content file and store locally in order to do the data pre-processing, figure 3.2 illustrates the data in the database. We does not have the raw data file from our local station, hence we
Figure 3.2 Data in the Database.
need to access to the production database and query the data. We chosen the FY17 to FY18 (1 Year) data to explore the relationship of the test sequence. After query the data, we use the c# program, to save the raw data file to local from the production database. The xml format consists all the information that required for this project such as StationId, Result Outcome, Model, Option, TestPoint Name, TestPointResults and etc. Each file size of the raw data around 2.2 megabytes, total 2222 rows of data, total file size is 5gigabytes.
Figure 3.2 Raw data file, xml format.
The raw data file is unable to be used directly for data mining task, we had created another c# program to extract the data that we need for the modeling. Result in Table form where each test is represented as rows as shown in Figure 3.3.