Around the 1970s/1980s the term information engineering methodology (IEM) was created to describe database design and the use of software for data analysis and processing.[3][4] These techniques were intended to be used by database administrators (DBAs) and by systems analysts based upon an understanding of the operational processing needs of organizations for the 1980s. In particular, these techniques were meant to help bridge the gap between strategic business planning and information systems. A key early contributor (often called the "father" of information engineering methodology) was the Australian Clive Finkelstein, who wrote several articles about it between 1976 and 1980, and also co-authored an influential Savant Institute report on it with James Martin.[5][6][7] Over the next few years, Finkelstein continued work in a more business-driven direction, which was intended to address a rapidly changing business environment; Martin continued work in a more data processing-driven direction. From 1983 to 1987, Charles M. Richter, guided by Clive Finkelstein, played a significant role in revamping IEM as well as helping to design the IEM software product (user data), which helped automate IEM.
In the early 2000s, the data and data tooling was generally held by the information technology (IT) teams in most companies.[8] Other teams then used data for their work (e.g. reporting), and there was usually little overlap in data skillset between these parts of the business.
High-performance computing is critical for the processing and analysis of data. One particularly widespread approach to computing for data engineering is dataflow programming, in which the computation is represented as a directed graph (dataflow graph); nodes are the operations, and edges represent the flow of data.[9] Popular implementations include Apache Spark, and the deep learning specific TensorFlow.[9][10][11] More recent implementations, such as Differential/Timely Dataflow, have used incremental computing for much more efficient data processing.[9][12][13]
Data is stored in a variety of ways, one of the key deciding factors is in how the data will be used.
Data engineers optimize data storage and processing systems to reduce costs. They use data compression, partitioning, and archiving.
If the data is structured and some form of online transaction processing is required, then databases are generally used.[14] Originally mostly relational databases were used, with strong ACID transaction correctness guarantees; most relational databases use SQL for their queries. However, with the growth of data in the 2010s, NoSQL databases have also become popular since they horizontally scaled more easily than relational databases by giving up the ACID transaction guarantees, as well as reducing the object-relational impedance mismatch.[15] More recently, NewSQL databases — which attempt to allow horizontal scaling while retaining ACID guarantees — have become popular.[16][17][18][19]
If the data is structured and online analytical processing is required (but not online transaction processing), then data warehouses are a main choice.[20] They enable data analysis, mining, and artificial intelligence on a much larger scale than databases can allow,[20] and indeed data often flow from databases into data warehouses.[21]Business analysts, data engineers, and data scientists can access data warehouses using tools such as SQL or business intelligence software.[21]
The number and variety of different data processes and storage locations can become overwhelming for users. This inspired the usage of a workflow management system (e.g. Airflow) to allow the data tasks to be specified, created, and monitored.[24] The tasks are often specified as a directed acyclic graph (DAG).[24]
Business objectives that executives set for what's to come are characterized in key business plans, with their more noteworthy definition in tactical business plans and implementation in operational business plans. Most businesses today recognize the fundamental need to grow a business plan that follows this strategy. It is often difficult to implement these plans because of the lack of transparency at the tactical and operational degrees of organizations. This kind of planning requires feedback to allow for early correction of problems that are due to miscommunication and misinterpretation of the business plan.
A data engineer is a type of software engineer who creates big dataETL pipelines to manage the flow of data through the organization. This makes it possible to take huge amounts of data and translate it into insights.[28] They are focused on the production readiness of data and things like formats, resilience, scaling, and security. Data engineers usually hail from a software engineering background and are proficient in programming languages like Java, Python, Scala, and Rust.[29][3] They will be more familiar with databases, architecture, cloud computing, and Agile software development.[3]
John Hares (1992). "Information Engineering for the Advanced Practitioner", Wiley.
Clive Finkelstein (1989). An Introduction to Information Engineering: From Strategic Planning to Information Systems. Sydney: Addison-Wesley.
Clive Finkelstein (1992). "Information Engineering: Strategic Systems Development". Sydney: Addison-Wesley.
Ian Macdonald (1986). "Information engineering". in: Information Systems Design Methodologies. T.W. Olle et al. (ed.). North-Holland.
Ian Macdonald (1988). "Automating the Information engineering methodology with the Information Engineering Facility". In: Computerized Assistance during the Information Systems Life Cycle. T.W. Olle et al. (ed.). North-Holland.
James Martin and Clive Finkelstein. (1981). Information engineering. Technical Report (2 volumes), Savant Institute, Carnforth, Lancs, UK.
James Martin (1989). Information engineering. (3 volumes), Prentice-Hall Inc.
Clive Finkelstein (2006) "Enterprise Architecture for Integration: Rapid Delivery Methods and Technologies". First Edition, Artech House, Norwood MA in hardcover.
Clive Finkelstein (2011) "Enterprise Architecture for Integration: Rapid Delivery Methods and Technologies". Second Edition is in PDF at www.ies.aust.com and as an ebook on the Apple iPad and ebook on the Amazon Kindle.
Reis, Joe; Housley, Matt (2022) "Fundamentals of Data Engineering". O'Reilly Media, Inc. ISBN 9781098108304